Documentation
¶
Index ¶
Constants ¶
const ( DefaultMaxDepth = 1 DefaultParallels = 2 DefaultDelay = 3 DefaultAsync = true )
Variables ¶
var ErrScrapingFailed = errors.New("scraper could not read URL, or scraping is not allowed for provided URL")
Functions ¶
func ExtractTextFromHTML ¶
func ExtractURL ¶
func RemoveBlankLines ¶
Types ¶
type Options ¶
type Options func(*Scraper)
func WithAsync ¶
async: The boolean value indicating if the scraper should run asynchronously. Returns a function that sets the async option for the Scraper.
func WithBlacklist ¶
WithBlacklist creates an Options function that appends the url endpoints to be excluded from the scraping, to the current list
Default value:
[]string{ "login", "signup", "signin", "register", "logout", "download", "redirect", },
blacklist: slice of strings with url endpoints to be excluded from the scraping. Returns: an Options function.
func WithDelay ¶
WithDelay creates an Options function that sets the delay of a Scraper.
The delay parameter specifies the amount of time in milliseconds that the Scraper should wait between requests.
Default value: 3
delay: the delay to set. Returns: an Options function.
func WithMaxDepth ¶
WithMaxDepth sets the maximum depth for the Scraper.
Default value: 1
maxDepth: the maximum depth to set. Returns: an Options function.
func WithNewBlacklist ¶
WithNewBlacklist creates an Options function that replaces the list of url endpoints to be excluded from the scraping, with a new list.
Default value:
[]string{ "login", "signup", "signin", "register", "logout", "download", "redirect", },
blacklist: slice of strings with url endpoints to be excluded from the scraping. Returns: an Options function.
func WithParallelsNum ¶
WithParallelsNum sets the number of maximum allowed concurrent requests of the matching domains
Default value: 2
parallels: the number of parallels to set. Returns: the updated Scraper options.