scraper

package
v0.1.14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 30, 2024 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package scraper contains an implementation of the tool interface for a web scraping tool.

Index

Constants

View Source
const (
	DefualtMaxDept   = 1
	DefualtParallels = 2
	DefualtDelay     = 3
	DefualtAsync     = true
)

Variables

View Source
var ErrScrapingFailed = errors.New("scraper could not read URL, or scraping is not allowed for provided URL")

Functions

This section is empty.

Types

type Options

type Options func(*Scraper)

func WithAsync

func WithAsync(async bool) Options

async: The boolean value indicating if the scraper should run asynchronously. Returns a function that sets the async option for the Scraper.

func WithBlacklist

func WithBlacklist(blacklist []string) Options

WithBlacklist creates an Options function that appends the url endpoints to be excluded from the scraping, to the current list

Default value:

[]string{
	"login",
	"signup",
	"signin",
	"register",
	"logout",
	"download",
	"redirect",
},

blacklist: slice of strings with url endpoints to be excluded from the scraping. Returns: an Options function.

func WithDelay

func WithDelay(delay int64) Options

WithDelay creates an Options function that sets the delay of a Scraper.

The delay parameter specifies the amount of time in milliseconds that the Scraper should wait between requests.

Default value: 3

delay: the delay to set. Returns: an Options function.

func WithMaxDepth

func WithMaxDepth(maxDepth int) Options

WithMaxDepth sets the maximum depth for the Scraper.

Default value: 1

maxDepth: the maximum depth to set. Returns: an Options function.

func WithNewBlacklist

func WithNewBlacklist(blacklist []string) Options

WithNewBlacklist creates an Options function that replaces the list of url endpoints to be excluded from the scraping, with a new list.

Default value:

[]string{
	"login",
	"signup",
	"signin",
	"register",
	"logout",
	"download",
	"redirect",
},

blacklist: slice of strings with url endpoints to be excluded from the scraping. Returns: an Options function.

func WithParallelsNum

func WithParallelsNum(parallels int) Options

WithParallelsNum sets the number of maximum allowed concurrent requests of the matching domains

Default value: 2

parallels: the number of parallels to set. Returns: the updated Scraper options.

type Scraper

type Scraper struct {
	MaxDepth  int
	Parallels int
	Delay     int64
	Blacklist []string
	Async     bool
}

func New

func New(options ...Options) (*Scraper, error)

New creates a new instance of Scraper with the provided options.

The options parameter is a variadic argument allowing the user to specify custom configuration options for the Scraper. These options can be functions that modify the Scraper's properties.

The function returns a pointer to a Scraper instance and an error. The error value is nil if the Scraper is created successfully.

func (Scraper) Call

func (s Scraper) Call(ctx context.Context, input string) (string, error)

Call scrapes a website and returns the site data.

The function takes a context.Context object for managing the execution context and a string input representing the URL of the website to be scraped. It returns a string containing the scraped data and an error if any.

func (Scraper) Description

func (s Scraper) Description() string

Description returns the description of the Go function.

There are no parameters. It returns a string.

func (Scraper) Name

func (s Scraper) Name() string

Name returns the name of the Scraper.

No parameters. Returns a string.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL