crawler

package

v0.0.0-...-8b501b0 Latest Latest Go to latest Published: Jul 6, 2023 License: MIT Imports: 14 Imported by: 0

Documentation ¶

Index ¶

type Config
type GraphAPI
type IndexAPI
type Service
- func NewService(cfg Config) (*Service, error)
- func (svc *Service) Name() string
- func (svc *Service) Run(ctx context.Context) error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Config ¶

type Config struct {
	// An API for managing and interating links and edges in the link graph.
	GraphAPI GraphAPI

	// An API for indexing documents.
	IndexAPI IndexAPI

	// An API for detecting private network addresses. If not specified,
	// a default implementation that handles the private network ranges
	// defined in RFC1918 will be used instead.
	PrivateNetworkDetector crawler_pipeline.PrivateNetworkDetector

	// An API for performing HTTP requests. If not specified,
	// http.DefaultClient will be used instead.
	URLGetter crawler_pipeline.URLGetter

	// An API for detecting the partition assignments for this service.
	PartitionDetector partition.Detector

	// A clock instance for generating time-related events. If not specified,
	// the default wall-clock will be used instead.
	Clock clock.Clock

	// The number of concurrent workers used for retrieving links.
	FetchWorkers int

	// The time between subsequent crawler passes.
	UpdateInterval time.Duration

	// The minimum amount of time before re-indexing an already-crawled link.
	ReIndexThreshold time.Duration

	// The logger to use. If not defined an output-discarding logger will
	// be used instead.
	Logger *logrus.Entry
}

Config encapsulates the settings for configuring the web-crawler service.

type GraphAPI ¶

type GraphAPI interface {
	UpsertLink(link *graph.Link) error
	UpsertEdge(edge *graph.Edge) error
	RemoveStaleEdges(fromID uuid.UUID, updatedBefore time.Time) error
	Links(fromID, toID uuid.UUID, retrievedBefore time.Time) (graph.LinkIterator, error)
}

GraphAPI defines as set of API methods for accessing the link graph.

type IndexAPI ¶

type IndexAPI interface {
	Index(doc *index.Document) error
}

IndexAPI defines a set of API methods for indexing crawled documents.

type Service ¶

type Service struct {
	// contains filtered or unexported fields
}

Service implements the web-crawler component for the Links 'R' Us project.

func NewService ¶

func NewService(cfg Config) (*Service, error)

NewService creates a new crawler service instance with the specified config.

func (*Service) Name ¶

func (svc *Service) Name() string

Name implements service.Service

func (*Service) Run ¶

func (svc *Service) Run(ctx context.Context) error

Run implements service.Service

Source Files ¶

View all Source files

crawler.go

Directories ¶

Path	Synopsis
mocks Package mocks is a generated GoMock package.	Package mocks is a generated GoMock package.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL