Documentation
¶
Overview ¶
A package for parsing robots.txt
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CleanInput ¶
func GetRobotsTxtUrl ¶
GetRobotsTxtUrl returns the location of robots.txt given a URL that points to somewhere on the server.
Types ¶
type RobotsTxt ¶
type RobotsTxt struct {
DisallowAll, AllowAll bool
// User-agents to disallowed URLs
Rules Rules
Url *url.URL
// contains filtered or unexported fields
}
func NewRobotsTxtFromUrl ¶
func (*RobotsTxt) Allowed ¶
Ask if a specific UserAgent and URL that it wants to crawl is an allowed action. BUG(ChuckHa): Will fail when UserAgent: * and Disallow: / followed by UserAgent: Squidbot and Disallow:
func (*RobotsTxt) GetRobotsTxtFromUrl ¶
Actually get the contents from some robots.txt url.
func (*RobotsTxt) NotAllowed ¶
Notes ¶
Bugs ¶
Will fail when UserAgent: * and Disallow: / followed by UserAgent: Squidbot and Disallow:
Click to show internal directories.
Click to hide internal directories.