scraper

package

v0.2.0 Latest Latest Go to latest Published: Jun 21, 2024 License: MIT Imports: 27 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cornelk/goscrape

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func Headers(headers []string) http.Header
func ServeDirectory(ctx context.Context, path string, port int16, logger *log.Logger) error
type Config
type Cookie
type Scraper
- func New(logger *log.Logger, cfg Config) (*Scraper, error)
- func (s *Scraper) Cookies() []Cookie
- func (s *Scraper) Start(ctx context.Context) error

Constants ¶

View Source

const (
	// PageExtension is the file extension that downloaded pages get.
	PageExtension = ".html"
	// PageDirIndex is the file name of the index file for every dir.
	PageDirIndex = "index" + PageExtension
)

Variables ¶

This section is empty.

Functions ¶

func Headers ¶ added in v0.2.0

func Headers(headers []string) http.Header

func ServeDirectory ¶ added in v0.2.0

func ServeDirectory(ctx context.Context, path string, port int16, logger *log.Logger) error

Types ¶

type Config ¶ added in v0.1.1

type Config struct {
	URL      string
	Includes []string
	Excludes []string

	ImageQuality uint // image quality from 0 to 100%, 0 to disable reencoding
	MaxDepth     uint // download depth, 0 for unlimited
	Timeout      uint // time limit in seconds to process each http request

	OutputDirectory string
	Username        string
	Password        string

	Cookies   []Cookie
	Header    http.Header
	Proxy     string
	UserAgent string
}

Config contains the scraper configuration.

type Cookie struct {
	Name  string `json:"name"`
	Value string `json:"value,omitempty"`

	Expires *time.Time `json:"expires,omitempty"`
}

Cookie represents a cookie, it copies parts of the http.Cookie struct but changes the JSON marshaling to exclude empty fields.

type Scraper ¶

type Scraper struct {
	URL *url.URL // contains the main URL to parse, will be modified in case of a redirect
	// contains filtered or unexported fields
}

Scraper contains all scraping data.

func New ¶

func New(logger *log.Logger, cfg Config) (*Scraper, error)

New creates a new Scraper instance. nolint: funlen

func (*Scraper) Cookies ¶ added in v0.2.0

func (s *Scraper) Cookies() []Cookie

Cookies returns the current cookies.

func (*Scraper) Start ¶

func (s *Scraper) Start(ctx context.Context) error

Start starts the scraping.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL