Documentation ¶
Index ¶
- func Crawl(id int, userAgent string, waiting <-chan *url.URL, processed chan<- *url.URL, ...)
- func Download(id int, userAgent string, dir string, wainting <-chan *url.URL, ...)
- func Harvest(id int, domain *url.URL, filterPattern string, content <-chan string, ...)
- func MakeURIParser(tag, element string, domain *url.URL, filterPattern string) func(html string) []*url.URL
- func ParseHTMLElementValues(html, tag, element string) []string
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Crawl ¶
func Crawl(id int, userAgent string, waiting <-chan *url.URL, processed chan<- *url.URL, content chan<- string)
Crawl parses URLs from a `waiting` channel, places the content in a `content` channel, and places the URL on an `processed` channel.
func Download ¶
func Download(id int, userAgent string, dir string, wainting <-chan *url.URL, processed chan<- *url.URL)
Download downloads resources from URIs in a `waiting` channel and URI to a given `dir` and puts the URI in a `processed` channel
func Harvest ¶
func Harvest(id int, domain *url.URL, filterPattern string, content <-chan string, sites, images chan<- *url.URL)
Harvest extracts URIs from `anchor` (a) and `image` (img) tags in an HTML string, from htlm strings from a `content` channel `domain` is used to resolve the full URL of relative URLs
func MakeURIParser ¶
func MakeURIParser(tag, element string, domain *url.URL, filterPattern string) func(html string) []*url.URL
MakeURIParser returns a function that takes a string with `html` and returns a list of URIs that match a given regex pattern. If the parsed URI is a relative URL, the `domain` URL is used to resolve it to an absolute path.
func ParseHTMLElementValues ¶
ParseHTMLElementValues parses a specified html `element` with a specified `tag`.
Types ¶
This section is empty.