spider

package
v0.0.0-...-04f6dc1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 16, 2017 License: MIT Imports: 15 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Option

type Option func(*Spider)

Option is a function that configures the spider.

func WithConcurrency

func WithConcurrency(con int) Option

WithConcurrency sets how many workers will request urls concurrently.

func WithIgnoreRobots

func WithIgnoreRobots(ignore bool) Option

WithIgnoreRobots sets whether or not the spider should ignore the robots.txt data.

func WithRequester

func WithRequester(req Requester) Option

WithRequester sets the requester that the spider should use to make requests.

func WithRoot

func WithRoot(root *url.URL) Option

WithRoot sets the rootURL for the spider.

func WithTimeout

func WithTimeout(dur time.Duration) Option

WithTimeout sets the request timeout.

func WithUserAgent

func WithUserAgent(agent string) Option

WithUserAgent overwrites the default user agent.

type Requester

type Requester interface {
	Request(ctx context.Context, uri *url.URL) ([]byte, error)
	SetUserAgent(agent string)
}

Requester is something that can make a request.

type Seener

type Seener interface {
	Seen(*url.URL) bool
}

Seener is something which can check if a URL has ever been seen.

type Spider

type Spider struct {
	// contains filtered or unexported fields
}

Spider can run requests against a URI until it sees every internal page on that site at least once. It can be configued with Option arguments which override defaults.

func New

func New(options ...Option) *Spider

New creates a new spider with the given options.

func (*Spider) Report

func (s *Spider) Report(w io.Writer) error

Report writes the report to the writer.

func (*Spider) Run

func (s *Spider) Run() error

Run the spider. Start at the root and follow all valid URLs, building a map of the site.

Directories

Path Synopsis
internal
concurrency
Package concurrency provides common concurrency patterns and utilities.
Package concurrency provides common concurrency patterns and utilities.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL