Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CoveredDateParts ¶
type CoveredDateParts struct { Day bool `yaml:"day"` Month bool `yaml:"month"` Year bool `yaml:"year"` Time bool `yaml:"time"` }
CoveredDateParts is used to determine what parts of a date a DateComponent covers
type DateComponent ¶
type DateComponent struct { Covers CoveredDateParts `yaml:"covers"` ElementLocation ElementLocation `yaml:"location"` Layout []string `yaml:"layout"` }
A DateComponent is used to find a specific part of a date within a html document
type DynamicField ¶
type DynamicField struct { Name string `yaml:"name"` Type string `yaml:"type"` // can currently be text, url or date // If a field can be found on a subpage the following variable has to contain a field name of // a field of type 'url' that is located on the main page. ElementLocation ElementLocation `yaml:"location"` OnSubpage string `yaml:"on_subpage"` // applies to text, url, date CanBeEmpty bool `yaml:"can_be_empty"` // applies to text, url Components []DateComponent `yaml:"components"` // applies to date DateLocation string `yaml:"date_location"` // applies to date DateLanguage string `yaml:"date_language"` // applies to date Relative bool `yaml:"relative"` // applies to url Hide bool `yaml:"hide"` // appliess to text, url, date }
A DynamicField contains all the information necessary to scrape a dynamic field from a website, ie a field who's value changes for each item
type ElementLocation ¶
type ElementLocation struct { Selector string `yaml:"selector"` NodeIndex int `yaml:"node_index"` ChildIndex int `yaml:"child_index"` RegexExtract RegexConfig `yaml:"regex_extract"` Attr string `yaml:"attr"` MaxLength int `yaml:"max_length"` }
ElementLocation is used to find a specific string in a html document
type Filter ¶
type Filter struct { Field string `yaml:"field"` Regex string `yaml:"regex"` Match bool `yaml:"match"` }
A Filter is used to filter certain items from the result list
type RegexConfig ¶
RegexConfig is used for extracting a substring from a string based on the given Exp and Index
type Scraper ¶
type Scraper struct { Name string `yaml:"name"` URL string `yaml:"url"` Item string `yaml:"item"` ExcludeWithSelector []string `yaml:"exclude_with_selector"` Fields struct { Static []StaticField `yaml:"static"` Dynamic []DynamicField `yaml:"dynamic"` } `yaml:"fields"` Filters []Filter `yaml:"filters"` Paginator struct { Selector string `yaml:"selector"` Relative bool `yaml:"relative"` MaxPages int `yaml:"max_pages"` NodeIndex int `yaml:"node_index"` } }
A Scraper contains all the necessary config parameters and structs needed to extract the desired information from a website
type StaticField ¶
A StaticField defines a field that has a fixed name and value across all scraped items