Documentation
¶
Index ¶
Constants ¶
const ( // DefaultOutputFileDot is the .dot file location to save the sitemap graph information to. DefaultOutputFileDot = "sitemap.dot" // DefaultOutputFileSvg is the .svg file location to save the sitemap graph to. DefaultOutputFileSvg = "sitemap.svg" )
const FetchTimeout = 5 * time.Second
FetchTimeout defines the max amount of time the parser will try to fetch a given page for.
Variables ¶
var ( // ErrExternalDomain is returned when the given URL redirects to a domain outside the starting domain ErrExternalDomain = errors.New("URL is outside the starting domain, ignoring") // ErrTooManyRedirects is returned after 10 consecutive redirects from a given URL ErrTooManyRedirects = errors.New("stopped after 10 redirects") )
Functions ¶
func Graph ¶
Graph renders the given sitemap as a graph saved in an SVG file. The graph is generated using dot, a graphviz tool. The dot command is invoked using the exec command, and it is assumed that dot is already installed. The sitemap data is first saved as a .dot file, which is then passed as source to the dot command.
Types ¶
type CanonicalURL ¶
type CanonicalURL string
CanonicalURL represents the normalised page URL (a full URL with no query params or fragments).
type Crawler ¶
type Crawler struct {
// contains filtered or unexported fields
}
Crawler is used to crawl a given starting URL, up to a max depth.
func NewCrawler ¶
NewCrawler returns an instance of the Crawler with all its required properties initialised.
func (*Crawler) Crawl ¶
Crawl will start crawling the URL given to the Crawler as the starting URL. Once the maximum depth is reached or no new pages are found, a Sitemap struct will be returned with the results. Crawl accepts a cancellable context and stops crawling when the context is cancelled, returning the current results.
type Page ¶
type Page struct { Addr CanonicalURL Links Links }
Page defines the data structure representing a single web page. Addr is the full URL of the page with no query params or fragments. Links is a collection of links found on the page.
type Parser ¶
type Parser struct {
// contains filtered or unexported fields
}
Parser parses the DOM of a single web page.
type Sitemap ¶
type Sitemap map[CanonicalURL]Links
Sitemap is the data structure holding current sitemap information. It's a map of a page URL to the links found on that page.