crawler

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 4, 2024 License: GPL-3.0 Imports: 19 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewCollector

func NewCollector(config configs.Crawler, torConfig configs.TorConfig) *colly.Collector

NewCollector initializes colly.NewCollector with the modifications needed for extravagant crawling.

func ParseResponse

func ParseResponse(url string, body string, response *colly.Response) (int, error)

ParseResponse creates a record in the database for web_pages

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

func NewCrawler

func NewCrawler(torConf configs.TorConfig, crawlerConf configs.Crawler) *Crawler

NewCrawler initializes a new crawler and adds urls to the queue

func (*Crawler) Crawl

func (c *Crawler) Crawl() error

Crawl starts the crawling process, entrypoints are StartingURLs in the config. We only define the onHTML attributes here because it's easier to handle the maxSize Errors here, we return them and exit the application

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL