Documentation ¶
Overview ¶
Package robotstxt parses robots.txt files
Aims to follow the Google robots.txt specification, see: https://developers.google.com/search/reference/robots_txt for more information.
Index ¶
- func NormaliseUserAgent(userAgent string) string
- type InvalidHostError
- type RobotsTxt
- func (r *RobotsTxt) AddPathRule(userAgent string, path string, isAllowed bool) error
- func (r *RobotsTxt) CrawlDelay(userAgent string) time.Duration
- func (r *RobotsTxt) Host() string
- func (r *RobotsTxt) IsAllowed(userAgent string, urlStr string) (result bool, err error)
- func (r *RobotsTxt) Sitemaps() []string
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NormaliseUserAgent ¶
NormaliseUserAgent normalizes a user agent
Types ¶
type InvalidHostError ¶
type InvalidHostError struct{}
InvalidHostError is the error when a URL is tested with IsAllowed that is not valid for this robots.txt file
func (InvalidHostError) Error ¶
func (e InvalidHostError) Error() string
type RobotsTxt ¶
type RobotsTxt struct {
// contains filtered or unexported fields
}
RobotsTxt represents a parsed robots.txt file
func Parse ¶
Parse parses the contents or a robots.txt file and returns a RobotsTxt struct that can be used to check if URLs can be crawled or extract crawl delays, sitemaps or the preferred host name
func (*RobotsTxt) AddPathRule ¶
AddPathRule adds another path rule
func (*RobotsTxt) CrawlDelay ¶
CrawlDelay returns the crawl delay for the specified user agent or 0 if there is none