Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CompressSpace ¶
CompressSpace reduces all whitespace sequences (space, tabs, newlines etc) in a string to a single space. Leading/trailing space is trimmed. Has the effect of converting multiline strings to one line.
func GetAttr ¶
GetAttr retrieved the value of an attribute on a node. Returns empty string if attribute doesn't exist.
func GetTextContent ¶
GetTextContent recursively fetches the text for a node
Types ¶
type DiscoverStats ¶
type Discoverer ¶
type Discoverer struct { Name string StartURL url.URL ArtPats []*regexp.Regexp BaseErrorThreshold int StripFragments bool StripQuery bool HostPat *regexp.Regexp ErrorLog Logger InfoLog Logger Stats DiscoverStats }
func NewDiscoverer ¶
func NewDiscoverer(cfg DiscovererDef) (*Discoverer, error)
func (*Discoverer) CookArticleURL ¶
type DiscovererDef ¶
type DiscovererDef struct { Name string URL string ArtPat []string // BaseErrorThreshold is starting number of http errors to accept before // bailing out. // error threshold formula: base + 10% of successful request count BaseErrorThreshold int // Hostpat is a regex matching accepted domains // if empty, reject everything on a different domain HostPat string // If NoStripQuery is set then article URLs won't have the query part zapped NoStripQuery bool }
type LinkSet ¶
thin map wrapper for some set operations
type NullLogger ¶
type NullLogger struct{}
func (NullLogger) Printf ¶
func (l NullLogger) Printf(format string, v ...interface{})
Click to show internal directories.
Click to hide internal directories.