Documentation ¶
Index ¶
Constants ¶
const ( DiscoverRequestType string = "DISCOVER" ExtractRequestType string = "EXTRACT" )
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Call ¶
Call wraps around a http.Request and adds a RequestType which should be either DiscoverRequestType or ExtractRequestType. In which the former will be used only to discover new URLs, and the latter will be stored locally for further processing.
type Crawler ¶
Crawler crawls any URL and returns Data containing what it has found, it also implements attribute.Taggable allowing it to tag said Data.
type Data ¶
Data contains all data that was found by a Crawler.Crawl call, the Call itself, and a collection of found calls.
type HtmlCrawler ¶
HtmlCrawler crawls http(s) urls and returns their raw data, as it uses http.Client.Do() nothing is rendered so data hidden in API calls will not be fetched. HtmlCrawler is concurrency safe and keeps a registry of all found URLs.
func NewHtmlCrawler ¶
func NewHtmlCrawler(c *http.Client) *HtmlCrawler
func (*HtmlCrawler) AddDiscoveryUrlRegex ¶
func (hc *HtmlCrawler) AddDiscoveryUrlRegex(expr string)
AddDiscoveryUrlRegex registers a new regex expression that is used to match URLs that should be collected for discovery.
func (*HtmlCrawler) AddExtractUrlRegex ¶
func (hc *HtmlCrawler) AddExtractUrlRegex(expr string)
AddExtractUrlRegex registers a new regex expression that is used to match URLs that should be collected for extraction.
func (*HtmlCrawler) Crawl ¶
func (hc *HtmlCrawler) Crawl(c *Call) *Data
Crawl crawls the given Call and returns the data and URLs it has found while doing so.
func (*HtmlCrawler) SetTag ¶
func (hc *HtmlCrawler) SetTag(t *attribute.Tag)
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager oversees all registered Crawler instances.
func NewManager ¶
func (*Manager) RegisterCrawler ¶
func (*Manager) RegisterCrawlers ¶
type RestCrawler ¶
RestCrawler crawls REST APIs using the provided Call instance.
func NewRestCrawler ¶
func NewRestCrawler(c *http.Client) *RestCrawler
NewRestCrawler returns a new instance of RestCrawler.
func (*RestCrawler) Crawl ¶
func (rc *RestCrawler) Crawl(c *Call) *Data
Crawl starts crawling based on the given Call instance and returns a Data instance containing the response as a string and any other relevant data found along the way.
func (*RestCrawler) SetTag ¶
func (rc *RestCrawler) SetTag(t *attribute.Tag)