Documentation ¶
Index ¶
- Constants
- func Crawl(profile *CrawlerProfile) (err error)
- func GetRobots(ctx context.Context, website, userAgent string, limiter *RequestLimiter) (*robotstxt.RobotsData, error)
- func TestRobotsGroup(robots *robotstxt.RobotsData, url, userAgent string) bool
- type CrawlerProfile
- type RequestLimiter
- type VisitMap
Constants ¶
View Source
const ( NoFollow = "nofollow" RobotsTxt = "robots.txt" UserAgentHeader = "User-Agent" )
Variables ¶
This section is empty.
Functions ¶
func Crawl ¶
func Crawl(profile *CrawlerProfile) (err error)
func GetRobots ¶
func GetRobots(ctx context.Context, website, userAgent string, limiter *RequestLimiter) (*robotstxt.RobotsData, error)
func TestRobotsGroup ¶
func TestRobotsGroup(robots *robotstxt.RobotsData, url, userAgent string) bool
Types ¶
type CrawlerProfile ¶
type CrawlerProfile struct { Ctx context.Context Website string UserAgent string // Limits MaxDepth int MaxRuntime time.Duration // Colly configuration CollyOptions []colly.CollectorOption CollyLimits *colly.LimitRule // Custom callbacks ResponseHooks []func(response *colly.Response) URLTests []func(url string) bool URLHooks []func(url string) }
type RequestLimiter ¶
type RequestLimiter struct { SleepDelay int // contains filtered or unexported fields }
func NewLimiter ¶
func NewLimiter(sleepDelay int) *RequestLimiter
func (*RequestLimiter) Decrease ¶
func (r *RequestLimiter) Decrease()
func (*RequestLimiter) Increase ¶
func (r *RequestLimiter) Increase()
func (*RequestLimiter) Sleep ¶
func (r *RequestLimiter) Sleep()
Source Files ¶
Click to show internal directories.
Click to hide internal directories.