Documentation ¶
Index ¶
- Constants
- func LongestCommonPrefix(path1 string, path2 string) string
- func ReduceURL(base *neturl.URL, url *neturl.URL) string
- type Crawler
- type Downloaded
- func (d *Downloaded) AddHeader(key string, value string)
- func (d *Downloaded) GetAssetURLs() []*neturl.URL
- func (d *Downloaded) GetDiscoveredURLs() []*neturl.URL
- func (d *Downloaded) GetHeaderKeys() []string
- func (d *Downloaded) GetHeaderValues(key string) []string
- func (d *Downloaded) ProcessURL(context urlContext, url string) (string, error)
- func (d *Downloaded) Reduce(url *neturl.URL) string
- type Input
- type Link
- type QueueItem
Constants ¶
View Source
const ( // CSSUri url from url() CSSUri urlContext = 1 + iota // HTMLTagA url from <a href=""></a> HTMLTagA // HTMLTagForm url from <form action="" /> HTMLTagForm // HTMLTagImg url from <img src="" /> HTMLTagImg // HTMLTagLinkStylesheet url from <link rel="stylesheet" href="" /> HTMLTagLinkStylesheet // HTMLTagScript url from <script src="" /> HTMLTagScript // HTTP3xxLocation url from HTTP response code 3xx HTTP3xxLocation )
Variables ¶
This section is empty.
Functions ¶
func LongestCommonPrefix ¶
LongestCommonPrefix returns the common path elements between two paths
Types ¶
type Crawler ¶
type Crawler interface { GetClientTimeout() time.Duration SetAutoDownloadDepth(uint64) GetAutoDownloadDepth() uint64 SetNoCrossHost(bool) GetNoCrossHost() bool AddRequestHeader(string, string) SetRequestHeader(string, string) GetRequestHeaderValues(string) []string SetWorkerCount(uint64) error GetWorkerCount() uint64 SetURLRewriter(func(*url.URL)) SetOnURLShouldQueue(func(*url.URL) bool) SetOnURLShouldDownload(func(*url.URL) bool) SetOnDownload(func(*url.URL)) SetOnDownloaded(func(*Downloaded)) GetEnqueuedCount() uint64 GetDownloadedCount() uint64 GetLinkFoundCount() uint64 HasStarted() bool HasStopped() bool IsRunning() bool IsBusy() bool Start() Stop() Enqueue(QueueItem) Download(QueueItem) *Downloaded Downloaded() (*Downloaded, bool) DownloadedNotBlocking() *Downloaded // contains filtered or unexported methods }
Crawler represents an object that can process download requests
type Downloaded ¶
type Downloaded struct { Input *Input BaseURL *url.URL Body string Error error LinksAssets map[string]Link LinksDiscovered map[string]Link StatusCode int // contains filtered or unexported fields }
Downloaded represents processed data after downloading
func Download ¶
func Download(input *Input) *Downloaded
Download returns parsed data after downloading the specified url.
func (*Downloaded) AddHeader ¶
func (d *Downloaded) AddHeader(key string, value string)
AddHeader adds a new header
func (*Downloaded) GetAssetURLs ¶
func (d *Downloaded) GetAssetURLs() []*neturl.URL
GetAssetURLs returns resolved asset urls
func (*Downloaded) GetDiscoveredURLs ¶
func (d *Downloaded) GetDiscoveredURLs() []*neturl.URL
GetDiscoveredURLs returns resolved discovered link urls
func (*Downloaded) GetHeaderKeys ¶
func (d *Downloaded) GetHeaderKeys() []string
GetHeaderKeys returns all header keys
func (*Downloaded) GetHeaderValues ¶
func (d *Downloaded) GetHeaderValues(key string) []string
GetHeaderValues returns values of the specified header key
func (*Downloaded) ProcessURL ¶
func (d *Downloaded) ProcessURL(context urlContext, url string) (string, error)
ProcessURL validates url and returns rewritten string representation
type Input ¶
type Input struct { Client *http.Client Header http.Header NoCrossHost bool Rewriter *func(*url.URL) URL *url.URL }
Input represents a download request ready to be processed
Click to show internal directories.
Click to hide internal directories.