Documentation ¶
Overview ¶
Package crawler is an internal package of the tool Crawl, responsible for executing a crawl.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct { // Exported configuration fields. Connections int UserAgent string RobotsUserAgent string Include []string Exclude []string From []string RespectNofollow bool MaxDepth int WaitTime string Timeout string Header []*data.Pair // contains filtered or unexported fields }
func (*Crawler) Next ¶
Returns the next result from the crawl. Results are guaranteed to come out in order ascending by depth. Within a "level" of depth, there is no guarantee as to which URLs will be crawled first.
Result objects are suitable for Marshling into JSON format and conform to the schema exported by the crawler.Schema package.
func (*Crawler) Start ¶
Crawl creates and starts a Crawler, and returns a pointer to it. The Crawler is a state machine running in its own goroutine. Therefore, calling this function may initiate many network requests, even before any results are requested from it.
If Start returns a non-nil error, calls to Next will fail.