Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { // An API for managing and interating links and edges in the link graph. GraphAPI GraphAPI // An API for indexing documents. IndexAPI IndexAPI // An API for detecting private network addresses. If not specified, // a default implementation that handles the private network ranges // defined in RFC1918 will be used instead. PrivateNetworkDetector crawler_pipeline.PrivateNetworkDetector // An API for performing HTTP requests. If not specified, // http.DefaultClient will be used instead. URLGetter crawler_pipeline.URLGetter // An API for detecting the partition assignments for this service. PartitionDetector partition.Detector // A clock instance for generating time-related events. If not specified, // the default wall-clock will be used instead. Clock clock.Clock // The number of concurrent workers used for retrieving links. FetchWorkers int // The time between subsequent crawler passes. UpdateInterval time.Duration // The minimum amount of time before re-indexing an already-crawled link. ReIndexThreshold time.Duration // The logger to use. If not defined an output-discarding logger will // be used instead. Logger *logrus.Entry }
Config encapsulates the settings for configuring the web-crawler service.
type GraphAPI ¶
type GraphAPI interface { UpsertLink(link *graph.Link) error UpsertEdge(edge *graph.Edge) error RemoveStaleEdges(fromID uuid.UUID, updatedBefore time.Time) error Links(fromID, toID uuid.UUID, retrievedBefore time.Time) (graph.LinkIterator, error) }
GraphAPI defines as set of API methods for accessing the link graph.
type Service ¶
type Service struct {
// contains filtered or unexported fields
}
Service implements the web-crawler component for the Links 'R' Us project.
func NewService ¶
NewService creates a new crawler service instance with the specified config.
Click to show internal directories.
Click to hide internal directories.