scraper

package
v0.0.0-...-0fd79f2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 20, 2024 License: GPL-3.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Scrape

func Scrape(starterUrl *url.URL, opts ...*ScrapeOptions) error

Types

type DomainRestriction

type DomainRestriction int8
const (
	None DomainRestriction = iota - 1
	SameDomainOnly
	ListOfDomains
)

type ScrapeOptions

type ScrapeOptions struct {
	// DomainRestriction represents the restriction imposed on the domains of the links when scraping, it has 3 different possiblities:
	//  - None: Scrape won't impose any restriction on domain names when scraping pages
	//  - SameDomainOnly: Scrape will only search pages that are in the same domain as the <url> argument
	//  - ListOfDomains: Scrape will only search pages that are passed on the DomainsList argument
	// Default value is SameDomainOnly
	DomainRestriction DomainRestriction

	// DomainList represents the list of domains that the Scrape function will be limited to when scraping pages
	// If DomainList is passed without "ListOfDomains" DomainRestriction configuration, this option will be ignored
	DomainList []string

	// DepthLimit represents the limit that the Scrape function will use when scraping some page
	// If DepthLimit is passed as 0, no limit will be used and the Scrape function will only stop if there are no more links to follow
	DepthLimit int
}

ScrapeOptions represent the options that can be passed to the Scrape function, it can be freely instatiated and passed by callers

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL