crawler

package
v0.2.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 22, 2021 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package crawler is an internal package of the tool Crawl, responsible for executing a crawl.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// Exported configuration fields.
	Connections     int
	UserAgent       string
	RobotsUserAgent string
	Include         []string
	Exclude         []string
	From            []string
	RespectNofollow bool
	MaxDepth        int
	WaitTime        string
	Timeout         string
	Header          []*data.Pair
	// contains filtered or unexported fields
}

func FromJSON

func FromJSON(in io.Reader) (*Crawler, error)

func (*Crawler) Next

func (c *Crawler) Next() *data.Result

Returns the next result from the crawl. Results are guaranteed to come out in order ascending by depth. Within a "level" of depth, there is no guarantee as to which URLs will be crawled first.

Result objects are suitable for Marshling into JSON format and conform to the schema exported by the crawler.Schema package.

func (*Crawler) Start

func (c *Crawler) Start() error

Crawl creates and starts a Crawler, and returns a pointer to it. The Crawler is a state machine running in its own goroutine. Therefore, calling this function may initiate many network requests, even before any results are requested from it.

If Start returns a non-nil error, calls to Next will fail.

Directories

Path Synopsis
Package data provides types appropriate for describing the output of a crawler.
Package data provides types appropriate for describing the output of a crawler.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL