crawler

package

v1.0.0 Latest Latest Go to latest Published: Jan 4, 2024 License: GPL-3.0 Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/iudicium/pryingdeep

Links

Open Source Insights

Documentation ¶

Index ¶

func NewCollector(config configs.Crawler, torConfig configs.TorConfig) *colly.Collector
func ParseResponse(url string, body string, response *colly.Response) (int, error)
type Crawler
- func NewCrawler(torConf configs.TorConfig, crawlerConf configs.Crawler) *Crawler
- func (c *Crawler) Crawl() error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewCollector ¶

func NewCollector(config configs.Crawler, torConfig configs.TorConfig) *colly.Collector

NewCollector initializes colly.NewCollector with the modifications needed for extravagant crawling.

func ParseResponse ¶

func ParseResponse(url string, body string, response *colly.Response) (int, error)

ParseResponse creates a record in the database for web_pages

Types ¶

type Crawler ¶

type Crawler struct {
	// contains filtered or unexported fields
}

func NewCrawler ¶

func NewCrawler(torConf configs.TorConfig, crawlerConf configs.Crawler) *Crawler

NewCrawler initializes a new crawler and adds urls to the queue

func (*Crawler) Crawl ¶

func (c *Crawler) Crawl() error

Crawl starts the crawling process, entrypoints are StartingURLs in the config. We only define the onHTML attributes here because it's easier to handle the maxSize Errors here, we return them and exit the application

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL