Documentation
¶
Index ¶
Constants ¶
const ( WAITING uint8 = 0 STOPPED uint8 = 1 RUNNING uint8 = 2 )
Possible Worker states
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AsyncHTTPCrawler ¶
type AsyncHTTPCrawler struct {
// contains filtered or unexported fields
}
AsyncHTTPCrawler is an implementation of the Crawler interface. It contains a fetcher that initiates the crawling and zero or more workers that perform the processing
func NewAsyncHTTPCrawler ¶
func NewAsyncHTTPCrawler(seedURL *url.URL) *AsyncHTTPCrawler
NewAsyncHTTPCrawler is a constructor. It takes in a Fetcher that will start the crawl and zero or more workers that will process the response and create a Sitemap
func (*AsyncHTTPCrawler) Crawl ¶
func (c *AsyncHTTPCrawler) Crawl() (sitemap.Sitemapper, error)
Crawl is the main entrypoint to crawling a domain (url). Crawl returns a Sitemapper that can later be used to create a represenation of the crawled site. It returns an error in case the crawl url is invalid
type AsyncHTTPFetcher ¶
type AsyncHTTPFetcher struct { //AsyncHTTPFetcher is an Asynchronous Worker *AsyncWorker // contains filtered or unexported fields }
AsyncHTTPFetcher implements Fetcher
func NewAsyncHTTPFetcher ¶
func NewAsyncHTTPFetcher() *AsyncHTTPFetcher
NewAsyncHTTPFetcher is a constructor for a AsyncHTTPFetcher. It does not start the Fetcher, which should be done by using the Run method
func (*AsyncHTTPFetcher) Fetch ¶
func (a *AsyncHTTPFetcher) Fetch(url *url.URL) error
Fetch places a request for a URL into the requestQueue Returns nil on success and an error in case the url is not valid
func (*AsyncHTTPFetcher) ResponseChannel ¶
func (a *AsyncHTTPFetcher) ResponseChannel() (responseQueue *FetchResponseQueue)
ResponseChannel is a Getter returning the Fetcher's Channel that consumers should be receiving results from
func (*AsyncHTTPFetcher) Run ¶
func (a *AsyncHTTPFetcher) Run() error
Run starts a loop that waits for requests or the quit signal. Run will be interrupted once the Stop method is used
func (*AsyncHTTPFetcher) Worker ¶
func (a *AsyncHTTPFetcher) Worker() Worker
Worker Returns the embedded AsyncWorker struct which is used to Run and Stop the fetcher worker
type AsyncHTTPParser ¶
type AsyncHTTPParser struct { //Fetcher is an Asynchronous Worker *AsyncWorker // contains filtered or unexported fields }
func NewAsyncHTTPParser ¶
func NewAsyncHTTPParser(seedURL *url.URL, fetcher Fetcher) *AsyncHTTPParser
func (*AsyncHTTPParser) ResponseChannel ¶
func (p *AsyncHTTPParser) ResponseChannel() *parserResponseQueue
func (*AsyncHTTPParser) Run ¶
func (p *AsyncHTTPParser) Run() error
Run starts a loop that waits for requests or the quit signal. Run will be interrupted once the Stop method is used
func (*AsyncHTTPParser) Worker ¶
func (p *AsyncHTTPParser) Worker() Worker
Worker Returns the embedded AsyncWorker struct which is used to Run and Stop the Parser worker
type AsyncHttpTracker ¶
type AsyncHttpTracker struct { //Tracker is an Asynchronous Worker *AsyncWorker // contains filtered or unexported fields }
An AsyncHttpTracker is an Asynchronous worker struct that is responsible for receiving URLs from a Parser and passing the uncrawled URLs to the Fetcher
func NewAsyncHttpTracker ¶
func NewAsyncHttpTracker(fetcher Fetcher, parser Parser) *AsyncHttpTracker
func (*AsyncHttpTracker) Run ¶
func (t *AsyncHttpTracker) Run() error
func (*AsyncHttpTracker) SetSitemapper ¶
func (t *AsyncHttpTracker) SetSitemapper(s sitemap.Sitemapper)
SetSitemapper provides the Tracker with a Sitemapper. The Tracker is responsible for building the providing the Sitemapper with new URL data
func (*AsyncHttpTracker) Worker ¶
func (t *AsyncHttpTracker) Worker() Worker
type AsyncWorker ¶
type AsyncWorker struct { RunFunc func() error Quit chan uint8 Name string // contains filtered or unexported fields }
AsyncWorker implements the worker interface It is meant to be embedded in another struct, like AsyncHttpFetcher
func NewAsyncWorker ¶
func NewAsyncWorker(name string) *AsyncWorker
NewAsyncWorker is a constructor for a AsyncWorker.
func (*AsyncWorker) SetState ¶
func (w *AsyncWorker) SetState(state uint8)
SetState setter (See interface definition)
func (*AsyncWorker) State ¶
func (w *AsyncWorker) State() uint8
State getter (See interface definition)
func (*AsyncWorker) Stop ¶
func (w *AsyncWorker) Stop()
Stop notifies the quit channel. The encapsulating struct's RunFunc needs to receive from the quit channel in order to stop.
func (*AsyncWorker) Type ¶
func (w *AsyncWorker) Type() string
Type returns the Name given to the Worker in initialisation
type Crawler ¶
type Crawler interface { //Crawl is the main entrypoint to //crawling a domain (url) Crawl(url string) (sitemap.Sitemapper, error) }
A Crawler crawls a domain and returns a representation of the crawled domain
type FetchMessage ¶
FetchMessage is a struct used to pass results of a Fetch request back to the requester. It includes Request: The original request (for tracking) Response Error in case request could not finish successfully
type FetchResponseQueue ¶
type FetchResponseQueue chan *FetchMessage
FetchResponseQueue queue is used for outgoing responses from the Fetcher
type Fetcher ¶
type Fetcher interface { // Fetch provides work to the Fetcher, in the // form of a URL to process Fetch(url *url.URL) error // ResponseChannel is a Getter returning // the Fetcher's Channel that consumers // should be receiving results from ResponseChannel() (responseQueue *FetchResponseQueue) // Retrieve Worker that manages Fetcher Service Worker() Worker }
Fetcher is an Asynchronous Worker interface that is responsible for Fetching URLs and exposing a ResponseChannel where the results of type FetchMessage are passed to the consumers
type HTPPClient ¶
type HTPPClient interface { // At the moment, response is of type http.Response which locks // in implementation! Get(url string) (resp *http.Response, err error) }
HTPPClient interface that wraps around the http.Client struct and can be replaced by any other client implementation
type Parser ¶
type Parser interface { // ResponseChannel is a Getter returning // the Parser's Channel that consumers // should be receiving results from ResponseChannel() (responseQueue *parserResponseQueue) // Retrieve Worker Worker() Worker }
Parser is an Asynchronous interface
type RequestQueue ¶
RequestQueue is used for incoming requests to the fetcher
type Tracker ¶
type Tracker interface { // SetSitemapper provides the Tracker with // a Sitemapper. The Tracker is responsible for // building the providing the Sitemapper with // new URL data. SetSitemapper(sitemap.Sitemapper) // Retrieve Worker Worker() Worker }
A Tracker is an Asynchronous worker interface that is responsible for receiving URLs from the
type Worker ¶
type Worker interface { // Run starts the Asynchronous worker Run() error // Returns worker name // Example names are: // - Fetcher // - Parser // - Tracker // - Sitemapper Type() string // State returns the state the worker is in: // RUNNING - processing work // WAITING - Waits for work // STOPPED - Not running State() uint8 SetState(state uint8) }
Worker is an interface that can be used to manage agents perform work in a different thread.