Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct { Name string // Name of crawler for easy identification *CrawlerConfig }
Crawler crawls the URL fetched from Queue and saves the contents to Models.
Crawler will quit after IdleTimeout when queue is empty
func NNewCrawlers ¶
func NNewCrawlers(n int, namePrefix string, cfg *CrawlerConfig) ([]*Crawler, error)
NNewCrawlers returns N new Crawlers configured with cfg. Crawlers will be named with namePrefix.
func NewCrawler ¶
func NewCrawler(name string, cfg *CrawlerConfig) (*Crawler, error)
NewCrawler return pointer to a new Crawler
type CrawlerConfig ¶
type CrawlerConfig struct { Queue *queue.UniqueQueue // global queue Models *models.Models // models to use BaseURL *url.URL // base URL to crawl UserAgent string // user-agent to use while crawling MarkedURLs []string // marked URL to save to model IgnorePatterns []string // URL pattern to ignore RequestDelay time.Duration // delay between subsequent requests IdleTimeout time.Duration // timeout after which crawler quits when queue is empty Logger *log.Logger // will log to [os.Stdout] when nil and when no PrettyLogger; ONLY log to file if also using PrettyLogger RetryTimes int // no. of times to retry failed request FailedRequests map[string]int // map to store failed requests stats KnownInvalidURLs *InvalidURLCache // known map of invalid URLs Ctx context.Context // context to quit on SIGINT/SIGTERM PrettyLogger PrettyLogger // optional logger to write to screen // contains filtered or unexported fields }
CrawlerConfig to configure a crawler
type InvalidURLCache ¶
type InvalidURLCache struct {
// contains filtered or unexported fields
}
InvalidURLCache is the cache for invalid URLs
type PrettyLogger ¶ added in v0.8.7
type PrettyLogger interface { // Log will send message to PrettyLogger instance // to be written on terminal Log(string) // Quit should initiate call to quit PrettyLogger when // the last crawler have exited Quit() }
PrettyLogger interface is used to write scrolling logs to terminal
Click to show internal directories.
Click to hide internal directories.