crawl

package

v0.0.0-...-ee38f16 Latest Latest Go to latest Published: Jan 27, 2018 License: Apache-2.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/guidj/manga-mirror

Links

Open Source Insights

Documentation ¶

Index ¶

func Crawl(id int, userAgent string, waiting <-chan *url.URL, processed chan<- *url.URL, ...)
func Download(id int, userAgent string, dir string, wainting <-chan *url.URL, ...)
func Harvest(id int, domain *url.URL, filterPattern string, content <-chan string, ...)
func MakeURIParser(tag, element string, domain *url.URL, filterPattern string) func(html string) []*url.URL
func ParseHTMLElementValues(html, tag, element string) []string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Crawl ¶

func Crawl(id int, userAgent string, waiting <-chan *url.URL, processed chan<- *url.URL, content chan<- string)

Crawl parses URLs from a `waiting` channel, places the content in a `content` channel, and places the URL on an `processed` channel.

func Download ¶

func Download(id int, userAgent string, dir string, wainting <-chan *url.URL, processed chan<- *url.URL)

Download downloads resources from URIs in a `waiting` channel and URI to a given `dir` and puts the URI in a `processed` channel

func Harvest ¶

func Harvest(id int, domain *url.URL, filterPattern string, content <-chan string, sites, images chan<- *url.URL)

Harvest extracts URIs from `anchor` (a) and `image` (img) tags in an HTML string, from htlm strings from a `content` channel `domain` is used to resolve the full URL of relative URLs

func MakeURIParser ¶

func MakeURIParser(tag, element string, domain *url.URL, filterPattern string) func(html string) []*url.URL

MakeURIParser returns a function that takes a string with `html` and returns a list of URIs that match a given regex pattern. If the parsed URI is a relative URL, the `domain` URL is used to resolve it to an absolute path.

func ParseHTMLElementValues ¶

func ParseHTMLElementValues(html, tag, element string) []string

ParseHTMLElementValues parses a specified html `element` with a specified `tag`.

Types ¶

This section is empty.

Source Files ¶

View all Source files

crawl.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL