Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct { Name string // Name of crawler for easy identification *CrawlerConfig }
Crawler crawls the URL fetched from Queue and saves the contents to Models.
Crawler will quit after IdleTimeout when queue is empty
func NNewCrawlers ¶
func NNewCrawlers(n int, namePrefix string, cfg *CrawlerConfig) ([]*Crawler, error)
NNewCrawlers returns N new Crawlers configured with cfg. Crawlers will be named with namePrefix.
func NewCrawler ¶
func NewCrawler(name string, cfg *CrawlerConfig) (*Crawler, error)
NewCrawler return pointer to a new Crawler
type CrawlerConfig ¶
type CrawlerConfig struct { Queue *queue.UniqueQueue // global queue Models *models.Models // models to use BaseURL *url.URL // base URL to crawl UserAgent string // user-agent to use while crawling MarkedURLs []string // marked URL to save to model IgnorePatterns []string // URL pattern to ignore RequestDelay time.Duration // delay between subsequent requests IdleTimeout time.Duration // timeout after which crawler quits when queue is empty Log *log.Logger // logger to use RetryTimes int // no. of times to retry failed request FailedRequests map[string]int // map to store failed requests stats KnownInvalidURLs *InvalidURLCache // known map of invalid URLs Ctx context.Context // context to quit on SIGINT/SIGTERM // contains filtered or unexported fields }
CrawlerConfig to configure a crawler
type InvalidURLCache ¶
type InvalidURLCache struct {
// contains filtered or unexported fields
}
InvalidURLCache is the cache for invalid URLs
Click to show internal directories.
Click to hide internal directories.