Documentation ¶
Overview ¶
Package scraper provides a straightforward interface for scraping web content.
Index ¶
- func ContentMissingError() error
- func MarshallingError(err error) error
- func RenderingError(err error) error
- type Attributes
- type EmptyTarget
- type Filter
- type Scraper
- func (scraper Scraper) Attributes() Attributes
- func (scraper Scraper) Content() *html.Node
- func (scraper Scraper) Find(filter Filter) *Scraper
- func (scraper Scraper) FindAll(filter Filter) <-chan *Scraper
- func (scraper Scraper) Render() (string, error)
- func (scraper Scraper) Text() (string, bool)
- func (scraper Scraper) TextOptimistic() string
- func (scraper Scraper) Type() string
- type Target
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ContentMissingError ¶
func ContentMissingError() error
func MarshallingError ¶
func RenderingError ¶
Types ¶
type Attributes ¶
Attributes specifies tag attributes to be searched for using the Scraper's Find methods. It is a convenience shorthand for `map[string]string` and can contain any number of attribute sets. Note that multiple parameters are resolved with an `&&` operator.
scraperInstance.FindAll(scraper.Filter(Attributes:scraper.Attributes{"class":"someClass"}))
type EmptyTarget ¶
type EmptyTarget struct {
// contains filtered or unexported fields
}
func (EmptyTarget) Content ¶
func (EmptyTarget) Content() *html.Node
func (EmptyTarget) IsValid ¶
func (EmptyTarget) IsValid() bool
func (EmptyTarget) Render ¶
func (EmptyTarget) Render() (string, error)
func (EmptyTarget) RenderingError ¶
func (EmptyTarget) RenderingError() error
type Filter ¶
type Filter struct { Tag string Attributes Attributes IsExact bool // contains filtered or unexported fields }
Filter is the input to the Scraper's Find methods. It can be populated by a tag type, parameters (see `Attributes`) or both. Note that multiple filter arguments are resolved with an `&&` operator.
scraperInstance.FindAll(scraper.Filter{Tag:"div"})
type Scraper ¶
type Scraper struct {
// contains filtered or unexported fields
}
Scraper is the base type used to scrape content. Do not instantiate it directly - rather use one of the provided scraper.New functions
func NewFromBuffer ¶
func NewFromBuffer(buffer io.ReadCloser) (*Scraper, error)
NewFromBuffer instantiates a new Scraper instance from a given `http.Response` (net/http). You should consider using `NewFromURI` if your requested resource is trivial to get. Note that this function will close the `Body` handle for you.
func NewFromNode ¶
NewFromNode instantiates a new Scraper instance from a given `html.Node` (golang.org/x/net/html). It is used internally to allow scraping the results of a previous scrape, but provided here if you want to build a hybrid.
func (Scraper) Attributes ¶
func (scraper Scraper) Attributes() Attributes
Attributes returns a map of all attributes on the node
func (Scraper) Content ¶
Content returns the node the Scraper instance is wrapping. It should be considered a lower-level API
func (Scraper) Find ¶
Find returns the first node matching the provided Filter. Note that this method is currently very inefficient and needs to be reimplemented
func (Scraper) FindAll ¶
FindAll returns all nodes matching the provided Filter TODO: better way to track completion?
func (Scraper) Render ¶
Render returns a rendered version of the Scraper's content. Note that the rendering is best-effort (see golang.org/x/net/html/render.go)
func (Scraper) Text ¶
Text returns the text embedded in the node. If other tags are nested under it, it will return an empty string and false OK
func (Scraper) TextOptimistic ¶
TextO is an optimistic version of Text that will simply return an empty string if anything goes wrong (see Text docs). It is useful for inlining operations if you trust your inputs
type Target ¶
type Target interface { // Render returns a pretty-rendered version of the target's scope Render() (string, error) // Render returns the tree-structure representation of the target Content() *html.Node IsValid() bool }
Target represents a scope that can be parsed into structured data or rendered as such. It is an implementation detail meant to allow better encapsulation for the different ways of instantiating a Scraper.