scraper

package

v1.0.2 Latest Latest Go to latest Published: Dec 4, 2024 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/rickb777/goscrape2

Links

Open Source Insights

Documentation ¶

Index ¶

type Scraper
- func New(cfg config.Config, url *urlpkg.URL, fs afero.Fs) (*Scraper, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Scraper ¶

type Scraper struct {
	URL *urlpkg.URL // contains the main URL to parse, will be modified in case of a redirect

	Client download.HttpClient
	Fs     afero.Fs // filesystem

	// ETagsDB stores ETags (hashes of file state) for each URL
	ETagsDB *db.DB
	// contains filtered or unexported fields
}

Scraper contains all scraping data, starts the process and handles the concurrency. It includes the logic to decide what URLs to include/exclude and when to stop.

func New ¶

func New(cfg config.Config, url *urlpkg.URL, fs afero.Fs) (*Scraper, error)

New creates a new Scraper instance. nolint: funlen

func (*Scraper) Cookies ¶

func (sc *Scraper) Cookies() []config.Cookie

Cookies returns the current cookies.

func (*Scraper) Downloader ¶

func (sc *Scraper) Downloader() *download.Download

func (*Scraper) Start ¶

func (sc *Scraper) Start(ctx context.Context) error

Start starts the scraping.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL