Documentation ¶
Overview ¶
Package robotstxt implements the robots.txt Exclusion Protocol as specified in http://www.robotstxt.org/wc/robots.html with various extensions.
Index ¶
- Variables
- type Group
- type ParseError
- type RobotsData
- func FromBytes(body []byte) (r *RobotsData, err error)
- func FromResponse(res *http.Response) (*RobotsData, error)
- func FromStatusAndBytes(statusCode int, body []byte) (*RobotsData, error)
- func FromStatusAndString(statusCode int, body string) (*RobotsData, error)
- func FromString(body string) (r *RobotsData, err error)
- type Rule
Constants ¶
This section is empty.
Variables ¶
View Source
var WhitespaceChars = []rune{' ', '\t', '\v'}
Functions ¶
This section is empty.
Types ¶
type ParseError ¶
type ParseError struct {
Errs []error
}
func (ParseError) Error ¶
func (e ParseError) Error() string
type RobotsData ¶
type RobotsData struct { // public Groups map[string]*Group AllowAll bool DisallowAll bool Host string Sitemaps []string }
func FromBytes ¶
func FromBytes(body []byte) (r *RobotsData, err error)
func FromResponse ¶
func FromResponse(res *http.Response) (*RobotsData, error)
func FromStatusAndBytes ¶
func FromStatusAndBytes(statusCode int, body []byte) (*RobotsData, error)
func FromStatusAndString ¶
func FromStatusAndString(statusCode int, body string) (*RobotsData, error)
func FromString ¶
func FromString(body string) (r *RobotsData, err error)
func (*RobotsData) FindGroup ¶
func (r *RobotsData) FindGroup(agent string) (ret *Group)
FindGroup searches block of declarations for specified user-agent. From Google's spec: Only one group of group-member records is valid for a particular crawler. The crawler must determine the correct group of records by finding the group with the most specific user-agent that still matches. All other Groups of records are ignored by the crawler. The user-agent is non-case-sensitive. The order of the Groups within the robots.txt file is irrelevant.
func (*RobotsData) TestAgent ¶
func (r *RobotsData) TestAgent(path, agent string) bool
Click to show internal directories.
Click to hide internal directories.