Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { Writer output.WriterConfig `yaml:"writer,omitempty"` Scrapers []Scraper `yaml:"scrapers,omitempty"` Global GlobalConfig `yaml:"global,omitempty"` }
Config defines the overall structure of the scraper configuration. Values will be taken from a config yml file or environment variables or both.
type CoveredDateParts ¶
type CoveredDateParts struct { Day bool `yaml:"day"` Month bool `yaml:"month"` Year bool `yaml:"year"` Time bool `yaml:"time"` }
CoveredDateParts is used to determine what parts of a date a DateComponent covers
type DateComponent ¶
type DateComponent struct { Covers CoveredDateParts `yaml:"covers"` ElementLocation ElementLocation `yaml:"location"` Layout []string `yaml:"layout"` }
A DateComponent is used to find a specific part of a date within a html document
type ElementLocation ¶
type ElementLocation struct { Selector string `yaml:"selector,omitempty"` NodeIndex int `yaml:"node_index,omitempty"` ChildIndex int `yaml:"child_index,omitempty"` RegexExtract RegexConfig `yaml:"regex_extract,omitempty"` Attr string `yaml:"attr,omitempty"` MaxLength int `yaml:"max_length,omitempty"` EntireSubtree bool `yaml:"entire_subtree,omitempty"` }
ElementLocation is used to find a specific string in a html document
type Field ¶ added in v0.2.10
type Field struct { Name string `yaml:"name"` Value string `yaml:"value,omitempty"` Type string `yaml:"type,omitempty"` // can currently be text, url or date // If a field can be found on a subpage the following variable has to contain a field name of // a field of type 'url' that is located on the main page. ElementLocation ElementLocation `yaml:"location,omitempty"` OnSubpage string `yaml:"on_subpage,omitempty"` // applies to text, url, date CanBeEmpty bool `yaml:"can_be_empty,omitempty"` // applies to text, url Components []DateComponent `yaml:"components,omitempty"` // applies to date DateLocation string `yaml:"date_location,omitempty"` // applies to date DateLanguage string `yaml:"date_language,omitempty"` // applies to date Hide bool `yaml:"hide,omitempty"` // appliess to text, url, date }
A Field contains all the information necessary to scrape a dynamic field from a website, ie a field who's value changes for each item
type Filter ¶
type Filter struct { Field string `yaml:"field"` Regex string `yaml:"regex"` Match bool `yaml:"match"` }
A Filter is used to filter certain items from the result list
type GlobalConfig ¶ added in v0.2.1
type GlobalConfig struct {
UserAgent string `yaml:"user-agent"`
}
GlobalConfig is used for storing global configuration parameters that are needed across all scrapers
type RegexConfig ¶
RegexConfig is used for extracting a substring from a string based on the given Exp and Index
type Scraper ¶
type Scraper struct { Name string `yaml:"name"` URL string `yaml:"url"` Item string `yaml:"item"` ExcludeWithSelector []string `yaml:"exclude_with_selector,omitempty"` Fields []Field `yaml:"fields,omitempty"` Filters []Filter `yaml:"filters,omitempty"` Paginator struct { Location ElementLocation `yaml:"location,omitempty"` MaxPages int `yaml:"max_pages,omitempty"` } `yaml:"paginator,omitempty"` RenderJs bool `yaml:"renderJs,omitempty"` }
A Scraper contains all the necessary config parameters and structs needed to extract the desired information from a website