goquery

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 17, 2012 License: BSD-3-Clause Imports: 7 Imported by: 0

README

goquery - a little like that j-thing, only in Go

GoQuery brings a syntax and a set of features similar to jQuery to the Go language. It is based on the experimental html package and the CSS Selector library cascadia. Since the experimental html parser returns tokens (nodes), and not a full-featured DOM object, jQuery's manipulation and modification functions have been left off (no point in modifying data in the parsed tree of the HTML, it has no effect).

Supported functions are query-oriented features (hasClass(), attr() and the likes), as well as traversing functions that make sense given what we have to work with. This makes GoQuery a great library for scraping web pages.

Syntax-wise, it is as close as possible to jQuery, with the same function names when possible, and that warm and fuzzy chainable interface. jQuery being the ultra-popular library that it is, I felt that writing a similar HTML-manipulating library was better to follow its API than to start anew (in the same spirit as Go's fmt package), even though some of its methods are less than intuitive (looking at you, index()...).

Installation

Since this library (and cascadia) depends on the experimental branch, this package must be installed first. Both GoQuery and Cascadia expect to find the experimental library with the "exp/html" import statement. To install it at this location, please follow this guide.

Once this is done, install GoQuery:

go get github.com/PuerkitoBio/goquery

To run unit tests, run this command in goquery's source directory ($GOPATH/src/github.com/PuerkitoBio/goquery):

go test

To run benchmarks, run this command in goquery's source directory:

go test -bench=".*"

Changelog

  • v0.1 : Initial release. See TODOs for a list of upcoming features.

API

GoQuery exposes two classes, Document and Selection. Unlike jQuery, which is loaded as part of a DOM document, and thus acts on its containing document, GoQuery doesn't know which HTML document to act upon. So it needs to be told, and that's what the Document class is for. It holds the root document node as the initial Selection object to manipulate.

jQuery often has many variants for the same function (no argument, a selector string argument, a jQuery object argument, a DOM element argument, ...). Instead of exposing the same features in GoQuery as a single method with variadic empty interface arguments, I use statically-typed signatures following this naming convention:

  • When the jQuery equivalent can be called with no argument, it has the same name as jQuery for the no argument signature (e.g.: Prev()), and the version with a selector string argument is called XxxFiltered() (e.g.: PrevFiltered())
  • When the jQuery equivalent requires one argument, the same name as jQuery is used for the selector string version (e.g.: Is())
  • The signatures accepting a jQuery object as argument are defined in GoQuery as XxxSelection() and take a *Selection object as argument (e.g.: FilterSelection())
  • The signatures accepting a DOM element as argument in jQuery are defined in GoQuery as XxxNodes() and take a variadic argument of type *html.Node (e.g.: FilterNodes())
  • Finally, the signatures accepting a function as argument in jQuery are defined in GoQuery as XxxFunction() and take a function as argument (e.g.: FilterFunction())

GoQuery's complete godoc reference documentation can be found here.

Please note that Cascadia's selectors do NOT necessarily match all supported selectors of jQuery (Sizzle). See the cascadia project for details.

Examples

Taken from example_test.go:

import (
  "fmt"
  // In real use, this import would be required (not in this example, since it
  // is part of the goquery package)
  //"github.com/PuerkitoBio/goquery"
  "strconv"
)

// This example scrapes the 10 reviews shown on the home page of MetalReview.com,
// the best metal review site on the web :) (and no, I'm not affiliated to them!)
func ExampleScrape_MetalReview() {
  // Load the HTML document (in real use, the type would be *goquery.Document)
  var doc *Document
  var e error
  if doc, e = NewDocument("http://metalreview.com"); e != nil {
    panic(e.Error())
  }

  // Find the review items (the type of the Selection would be *goquery.Selection)
  doc.Root.Find(".slider-row:nth-child(1) .slider-item").Each(func(i int, s *Selection) {
    var band, title string
    var score float64

    // For each item found, get the band, title and score, and print it
    band = s.Find("strong").Text()
    title = s.Find("em").Text()
    if score, e = strconv.ParseFloat(s.Find(".score").Text(), 64); e != nil {
      // Not a valid float, ignore score
      fmt.Printf("Review %d: %s - %s.\n", i, band, title)
    } else {
      // Print all, including score
      fmt.Printf("Review %d: %s - %s (%2.1f).\n", i, band, title, score)
    }
  })
  // To see the output of the Example while running the test suite (go test), simply
  // remove the leading "x" before Output on the next line. This will cause the
  // example to fail (all the "real" tests should pass).

  // xOutput: voluntarily fail the Example output.
}

TODOs

  • Add jQuery's Closest()? Other missing functions?
  • Support negative indices in Slice(), like jQuery.

License

The BSD 3-Clause license, the same as the Go language. Cascadia's license is here.

Documentation

Overview

Package goquery implements features similar to jQuery, including the chainable syntax, to manipulate and query an HTML document (the modification functions of jQuery are not included).

It depends on Go's experimental html package, which must be installed so that it can be imported as "exp/html". See this tutorial on how to install it accordingly: http://code.google.com/p/go-wiki/wiki/InstallingExp

It uses Cascadia as CSS selector (similar to Sizzle for jQuery). This dependency is automatically installed when using "go get ..." to install GoQuery.

To provide a chainable interface, error management is strict, and goquery panics if an invalid Cascadia selector is used (this is consistent with the behavior of jQuery/Sizzle/document.querySelectorAll, where an error is thrown). This is necessary since multiple return values cannot be used to allow a chainable interface.

It is hosted on GitHub, along with additional documentation in the README.md file: https://github.com/puerkitobio/goquery

The various methods are split into files based on the category of behavior:

* array.go : array-like positional manipulation of the selection.

  • Eq()
  • First()
  • Get()
  • Index...()
  • Last()
  • Slice()

* expand.go : methods that expand or augment the selection's set.

  • Add...()
  • AndSelf()
  • Union(), which is an alias for AddSelection()

* filter.go : filtering methods, that reduce the selection's set.

  • End()
  • Filter...()
  • Has...()
  • Intersection(), which is an alias of FilterSelection()
  • Not...()

* iteration.go : methods to loop over the selection's nodes.

  • Each()
  • Map()

* property.go : methods that inspect and get the node's properties values.

  • Attr()
  • Html()
  • Length()
  • Size(), which is an alias for Length()
  • Text()

* query.go : methods that query, or reflect, a node's identity.

  • Contains()
  • HasClass()
  • Is...()

* traversal.go : methods to traverse the HTML document tree.

  • Children...()
  • Contents()
  • Find...()
  • Next...()
  • Parent[s]...()
  • Prev...()
  • Siblings...()

* type.go : definition of the types exposed by GoQuery.

  • Document
  • Selection

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	Root *Selection
	Url  *url.URL
	// contains filtered or unexported fields
}

Document represents an HTML document to be manipulated. Unlike jQuery, which is loaded as part of a DOM document, and thus acts upon its containing document, GoQuery doesn't know which HTML document to act upon. So it needs to be told, and that's what the Document class is for. It holds the root document node to manipulate, and can make selections on this document.

func NewDocument

func NewDocument(url string) (d *Document, e error)

NewDocument() is a Document constructor that takes a string URL as argument. It loads the specified document, parses it, and stores the root Document node, ready to be manipulated.

func NewDocumentFromNode

func NewDocumentFromNode(root *html.Node) (d *Document)

NewDocumentFromNode() is a Document constructor that takes a root html Node as argument.

type Selection

type Selection struct {
	Nodes []*html.Node
	// contains filtered or unexported fields
}

Selection represents a collection of nodes matching some criteria. The initial Selection can be created by using Document.Find(), and then manipulated using the jQuery-like chainable syntax and methods.

func (*Selection) Add

func (this *Selection) Add(selector string) *Selection

Add() adds the selector string's matching nodes to those in the current selection and returns a new Selection object. The selector string is run in the context of the document of the current Selection object.

func (*Selection) AddNodes

func (this *Selection) AddNodes(nodes ...*html.Node) *Selection

AddNodes() adds the specified nodes to those in the current selection and returns a new Selection object.

func (*Selection) AddSelection

func (this *Selection) AddSelection(sel *Selection) *Selection

AddSelection() adds the specified Selection object's nodes to those in the current selection and returns a new Selection object.

func (*Selection) AndSelf

func (this *Selection) AndSelf() *Selection

AndSelf() adds the previous set of elements on the stack to the current set. It returns a new Selection object containing the current Selection combined with the previous one.

func (*Selection) Attr

func (this *Selection) Attr(attrName string) (val string, exists bool)

Attr() gets the specified attribute's value for the first element in the Selection. To get the value for each element individually, use a looping construct such as Each() or Map() method.

func (*Selection) Children

func (this *Selection) Children() *Selection

Children() gets the child elements of each element in the Selection. It returns a new Selection object containing these elements.

func (*Selection) ChildrenFiltered

func (this *Selection) ChildrenFiltered(selector string) *Selection

ChildrenFiltered() gets the child elements of each element in the Selection, filtered by the specified selector. It returns a new Selection object containing these elements.

func (*Selection) Contains

func (this *Selection) Contains(n *html.Node) bool

Contains() returns true if the specified Node is within, at any depth, one of the nodes in the Selection object. It is NOT inclusive, to behave like jQuery's implementation, and unlike Javascript's .contains(), so if the contained node is itself in the selection, it returns false.

func (*Selection) Contents

func (this *Selection) Contents() *Selection

Contents() gets the children of each element in the Selection, including text and comment nodes. It returns a new Selection object containing these elements.

func (*Selection) ContentsFiltered

func (this *Selection) ContentsFiltered(selector string) *Selection

ContentsFiltered() gets the children of each element in the Selection, filtered by the specified selector. It returns a new Selection object containing these elements. Since selectors only act on Element nodes, this function is an alias to ChildrenFiltered() unless the selector is empty, in which case it is an alias to Contents().

func (*Selection) Each

func (this *Selection) Each(f func(int, *Selection)) *Selection

Each() iterates over a Selection object, executing a function for each matched element. It returns the current Selection object.

func (*Selection) End

func (this *Selection) End() *Selection

End() ends the most recent filtering operation in the current chain and returns the set of matched elements to its previous state.

func (*Selection) Eq

func (this *Selection) Eq(index int) *Selection

Eq() reduces the set of matched elements to the one at the specified index. If a negative index is given, it counts backwards starting at the end of the set. It returns a new Selection object, and an empty Selection object if the index is invalid.

func (*Selection) Filter

func (this *Selection) Filter(selector string) *Selection

Filter() reduces the set of matched elements to those that match the selector string. It returns a new Selection object for this subset of matching elements.

func (*Selection) FilterFunction

func (this *Selection) FilterFunction(f func(int, *Selection) bool) *Selection

FilterFunction() reduces the set of matched elements to those that pass the function's test. It returns a new Selection object for this subset of elements.

func (*Selection) FilterNodes

func (this *Selection) FilterNodes(nodes ...*html.Node) *Selection

FilterNodes() reduces the set of matched elements to those that match the specified nodes. It returns a new Selection object for this subset of elements.

func (*Selection) FilterSelection

func (this *Selection) FilterSelection(s *Selection) *Selection

FilterSelection() reduces the set of matched elements to those that match a node in the specified Selection object. It returns a new Selection object for this subset of elements.

func (*Selection) Find

func (this *Selection) Find(selector string) *Selection

Find() gets the descendants of each element in the current set of matched elements, filtered by a selector. It returns a new Selection object containing these matched elements.

func (*Selection) FindNodes

func (this *Selection) FindNodes(nodes ...*html.Node) *Selection

FindNodes() gets the descendants of each element in the current Selection, filtered by some nodes. It returns a new Selection object containing these matched elements.

func (*Selection) FindSelection

func (this *Selection) FindSelection(sel *Selection) *Selection

FindSelection() gets the descendants of each element in the current Selection, filtered by a Selection. It returns a new Selection object containing these matched elements.

func (*Selection) First

func (this *Selection) First() *Selection

First() reduces the set of matched elements to the first in the set. It returns a new Selection object.

func (*Selection) Get

func (this *Selection) Get(index int) *html.Node

Get() retrieves the underlying node at the specified index. Get() without parameter is not implemented, since the node array is available on the Selection object.

func (*Selection) Has

func (this *Selection) Has(selector string) *Selection

Has() reduces the set of matched elements to those that have a descendant that matches the selector. It returns a new Selection object with the matching elements.

func (*Selection) HasClass

func (this *Selection) HasClass(class string) bool

HasClass() determines whether any of the matched elements are assigned the given class.

func (*Selection) HasNodes

func (this *Selection) HasNodes(nodes ...*html.Node) *Selection

HasNodes() reduces the set of matched elements to those that have a descendant that matches one of the nodes. It returns a new Selection object with the matching elements.

func (*Selection) HasSelection

func (this *Selection) HasSelection(sel *Selection) *Selection

HasSelection() reduces the set of matched elements to those that have a descendant that matches one of the nodes of the specified Selection object. It returns a new Selection object with the matching elements.

func (*Selection) Html

func (this *Selection) Html() (ret string, e error)

Html() gets the HTML contents of the first element in the set of matched elements. It includes text and comment nodes.

func (*Selection) Index

func (this *Selection) Index() int

Index() returns the position of the first element within the Selection object relative to its sibling elements.

func (*Selection) IndexOfNode

func (this *Selection) IndexOfNode(node *html.Node) int

IndexOfNode() returns the position of the specified node within the Selection object, or -1 if not found.

func (*Selection) IndexOfSelection

func (this *Selection) IndexOfSelection(s *Selection) int

IndexOfSelection() returns the position of the first node in the specified Selection object within this Selection object, or -1 if not found.

func (*Selection) IndexSelector

func (this *Selection) IndexSelector(selector string) int

IndexSelector() returns the position of the first element within the Selection object relative to the elements matched by the selector, or -1 if not found.

func (*Selection) Intersection

func (this *Selection) Intersection(s *Selection) *Selection

Intersection() is an alias for FilterSelection().

func (*Selection) Is

func (this *Selection) Is(selector string) bool

Is() checks the current matched set of elements against a selector and returns true if at least one of these elements matches.

func (*Selection) IsFunction

func (this *Selection) IsFunction(f func(int, *Selection) bool) bool

IsFunction() checks the current matched set of elements against a predicate and returns true if at least one of these elements matches.

func (*Selection) IsNodes

func (this *Selection) IsNodes(nodes ...*html.Node) bool

IsNodes() checks the current matched set of elements against the specified nodes and returns true if at least one of these elements matches.

func (*Selection) IsSelection

func (this *Selection) IsSelection(s *Selection) bool

IsSelection() checks the current matched set of elements against a Selection object and returns true if at least one of these elements matches.

func (*Selection) Last

func (this *Selection) Last() *Selection

Last() reduces the set of matched elements to the last in the set. It returns a new Selection object.

func (*Selection) Length

func (this *Selection) Length() int

Length() returns the number of elements in the Selection object.

func (*Selection) Map

func (this *Selection) Map(f func(int, *Selection) string) (result []string)

Map() passes each element in the current matched set through a function, producing a slice of string holding the returned values.

func (*Selection) Next

func (this *Selection) Next() *Selection

Next() gets the immediately following sibling of each element in the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) NextAll

func (this *Selection) NextAll() *Selection

NextAll() gets all the following siblings of each element in the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) NextAllFiltered

func (this *Selection) NextAllFiltered(selector string) *Selection

NextAllFiltered() gets all the following siblings of each element in the Selection filtered by a selector. It returns a new Selection object containing the matched elements.

func (*Selection) NextFiltered

func (this *Selection) NextFiltered(selector string) *Selection

NextFiltered() gets the immediately following sibling of each element in the Selection filtered by a selector. It returns a new Selection object containing the matched elements.

func (*Selection) NextFilteredUntil

func (this *Selection) NextFilteredUntil(filterSelector string, untilSelector string) *Selection

NextFilteredUntil() is like NextUntil(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) NextFilteredUntilNodes

func (this *Selection) NextFilteredUntilNodes(filterSelector string, nodes ...*html.Node) *Selection

NextFilteredUntilNodes() is like NextUntilNodes(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) NextFilteredUntilSelection

func (this *Selection) NextFilteredUntilSelection(filterSelector string, sel *Selection) *Selection

NextFilteredUntilSelection() is like NextUntilSelection(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) NextUntil

func (this *Selection) NextUntil(selector string) *Selection

NextUntil() gets all following siblings of each element up to but not including the element matched by the selector. It returns a new Selection object containing the matched elements.

func (*Selection) NextUntilNodes

func (this *Selection) NextUntilNodes(nodes ...*html.Node) *Selection

NextUntilNodes() gets all following siblings of each element up to but not including the element matched by the nodes. It returns a new Selection object containing the matched elements.

func (*Selection) NextUntilSelection

func (this *Selection) NextUntilSelection(sel *Selection) *Selection

NextUntilSelection() gets all following siblings of each element up to but not including the element matched by the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) Not

func (this *Selection) Not(selector string) *Selection

Not() removes elements from the Selection that match the selector string. It returns a new Selection object with the matching elements removed.

func (*Selection) NotFunction

func (this *Selection) NotFunction(f func(int, *Selection) bool) *Selection

Not() removes elements from the Selection that pass the function's test. It returns a new Selection object with the matching elements removed.

func (*Selection) NotNodes

func (this *Selection) NotNodes(nodes ...*html.Node) *Selection

Not() removes elements from the Selection that match the specified nodes. It returns a new Selection object with the matching elements removed.

func (*Selection) NotSelection

func (this *Selection) NotSelection(s *Selection) *Selection

Not() removes elements from the Selection that match a node in the specified Selection object. It returns a new Selection object with the matching elements removed.

func (*Selection) Parent

func (this *Selection) Parent() *Selection

Parent() gets the parent of each element in the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) ParentFiltered

func (this *Selection) ParentFiltered(selector string) *Selection

ParentFiltered() gets the parent of each element in the Selection filtered by a selector. It returns a new Selection object containing the matched elements.

func (*Selection) Parents

func (this *Selection) Parents() *Selection

Parents() gets the ancestors of each element in the current Selection. It returns a new Selection object with the matched elements.

func (*Selection) ParentsFiltered

func (this *Selection) ParentsFiltered(selector string) *Selection

ParentsFiltered() gets the ancestors of each element in the current Selection. It returns a new Selection object with the matched elements.

func (*Selection) ParentsFilteredUntil

func (this *Selection) ParentsFilteredUntil(filterSelector string, untilSelector string) *Selection

ParentsFilteredUntil() is like ParentsUntil(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) ParentsFilteredUntilNodes

func (this *Selection) ParentsFilteredUntilNodes(filterSelector string, nodes ...*html.Node) *Selection

ParentsFilteredUntilNodes() is like ParentsUntilNodes(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) ParentsFilteredUntilSelection

func (this *Selection) ParentsFilteredUntilSelection(filterSelector string, sel *Selection) *Selection

ParentsFilteredUntilSelection() is like ParentsUntilSelection(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) ParentsUntil

func (this *Selection) ParentsUntil(selector string) *Selection

ParentsUntil() gets the ancestors of each element in the Selection, up to but not including the element matched by the selector. It returns a new Selection object containing the matched elements.

func (*Selection) ParentsUntilNodes

func (this *Selection) ParentsUntilNodes(nodes ...*html.Node) *Selection

ParentsUntilNodes() gets the ancestors of each element in the Selection, up to but not including the specified nodes. It returns a new Selection object containing the matched elements.

func (*Selection) ParentsUntilSelection

func (this *Selection) ParentsUntilSelection(sel *Selection) *Selection

ParentsUntilSelection() gets the ancestors of each element in the Selection, up to but not including the elements in the specified Selection. It returns a new Selection object containing the matched elements.

func (*Selection) Prev

func (this *Selection) Prev() *Selection

Prev() gets the immediately preceding sibling of each element in the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) PrevAll

func (this *Selection) PrevAll() *Selection

PrevAll() gets all the preceding siblings of each element in the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) PrevAllFiltered

func (this *Selection) PrevAllFiltered(selector string) *Selection

PrevAllFiltered() gets all the preceding siblings of each element in the Selection filtered by a selector. It returns a new Selection object containing the matched elements.

func (*Selection) PrevFiltered

func (this *Selection) PrevFiltered(selector string) *Selection

PrevFiltered() gets the immediately preceding sibling of each element in the Selection filtered by a selector. It returns a new Selection object containing the matched elements.

func (*Selection) PrevFilteredUntil

func (this *Selection) PrevFilteredUntil(filterSelector string, untilSelector string) *Selection

PrevFilteredUntil() is like PrevUntil(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) PrevFilteredUntilNodes

func (this *Selection) PrevFilteredUntilNodes(filterSelector string, nodes ...*html.Node) *Selection

PrevFilteredUntilNodes() is like PrevUntilNodes(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) PrevFilteredUntilSelection

func (this *Selection) PrevFilteredUntilSelection(filterSelector string, sel *Selection) *Selection

PrevFilteredUntilSelection() is like PrevUntilSelection(), with the option to filter the results based on a selector string. It returns a new Selection object containing the matched elements.

func (*Selection) PrevUntil

func (this *Selection) PrevUntil(selector string) *Selection

PrevUntil() gets all preceding siblings of each element up to but not including the element matched by the selector. It returns a new Selection object containing the matched elements.

func (*Selection) PrevUntilNodes

func (this *Selection) PrevUntilNodes(nodes ...*html.Node) *Selection

PrevUntilNodes() gets all preceding siblings of each element up to but not including the element matched by the nodes. It returns a new Selection object containing the matched elements.

func (*Selection) PrevUntilSelection

func (this *Selection) PrevUntilSelection(sel *Selection) *Selection

PrevUntilSelection() gets all preceding siblings of each element up to but not including the element matched by the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) Siblings

func (this *Selection) Siblings() *Selection

Siblings() gets the siblings of each element in the Selection. It returns a new Selection object containing the matched elements.

func (*Selection) SiblingsFiltered

func (this *Selection) SiblingsFiltered(selector string) *Selection

SiblingsFiltered() gets the siblings of each element in the Selection filtered by a selector. It returns a new Selection object containing the matched elements.

func (*Selection) Size

func (this *Selection) Size() int

Size() is an alias for Length().

func (*Selection) Slice

func (this *Selection) Slice(start int, end int) *Selection

Slice() reduces the set of matched elements to a subset specified by a range of indices. At the moment, negative indices are not supported.

func (*Selection) Text

func (this *Selection) Text() string

Text() gets the combined text contents of each element in the set of matched elements, including their descendants.

func (*Selection) Union

func (this *Selection) Union(sel *Selection) *Selection

Union() is an alias for AddSelection().

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL