crawler

package
v0.0.0-...-2954126 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 7, 2018 License: MIT Imports: 13 Imported by: 1

Documentation

Index

Constants

View Source
const (
	WORKER_COUNT          = 40
	WORKER_SLEEP_TIME_SEC = 4 // seconds
)
View Source
const (
	USER_AGENT             = "Googlebot"
	ROBOTS_REQUEST_TIMEOUT = 2 // seconds
	ROBOTS_PATH            = "/robots.txt"
)

Variables

This section is empty.

Functions

func Crawl

func Crawl(seedPage url.URL, topicKeywords []string, targetCount int, timeLimit time.Duration) []url.URL

func GetPage

func GetPage(pageURL url.URL) (string, error)

func GetRobotsData

func GetRobotsData(host string) (*robotstxt.RobotsData, error)

func ShouldCrawl

func ShouldCrawl(url url.URL) (bool, error)

Types

This section is empty.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL