scraper

package
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2024 License: MIT Imports: 20 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Scraper

type Scraper struct {
	URL *urlpkg.URL // contains the main URL to parse, will be modified in case of a redirect

	Client download.HttpClient
	Fs     afero.Fs // filesystem

	// ETagsDB stores ETags (hashes of file state) for each URL
	ETagsDB *db.DB
	// contains filtered or unexported fields
}

Scraper contains all scraping data, starts the process and handles the concurrency. It includes the logic to decide what URLs to include/exclude and when to stop.

func New

func New(cfg config.Config, url *urlpkg.URL, fs afero.Fs) (*Scraper, error)

New creates a new Scraper instance. nolint: funlen

func (*Scraper) Cookies

func (sc *Scraper) Cookies() []config.Cookie

Cookies returns the current cookies.

func (*Scraper) Downloader

func (sc *Scraper) Downloader() *download.Download

func (*Scraper) Start

func (sc *Scraper) Start(ctx context.Context) error

Start starts the scraping.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL