scraper

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 12, 2020 License: MIT Imports: 23 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// PageExtension is the file extension that downloaded pages get
	PageExtension = ".html"
	// PageDirIndex is the file name of the index file for every dir
	PageDirIndex = "index" + PageExtension
)

Functions

func GetPageFilePath

func GetPageFilePath(url *url.URL) string

GetPageFilePath returns a filename for a URL that represents a page.

Types

type Scraper

type Scraper struct {
	// Configuration
	ImageQuality    uint
	MaxDepth        uint
	OutputDirectory string
	Username        string
	Password        string

	URL *url.URL
	// contains filtered or unexported fields
}

Scraper contains all scraping data

func New

func New(logger *zap.Logger, startURL string) (*Scraper, error)

New creates a new Scraper instance

func (*Scraper) GetFilePath

func (s *Scraper) GetFilePath(url *url.URL, isAPage bool) string

GetFilePath returns a file path for a URL to store the URL content in

func (*Scraper) RemoveAnchor

func (s *Scraper) RemoveAnchor(path string) string

RemoveAnchor removes anchors from URLS

func (*Scraper) SetExcludes

func (s *Scraper) SetExcludes(excludes []string) error

SetExcludes sets and checks the exclusions regular expressions.

func (*Scraper) SetIncludes

func (s *Scraper) SetIncludes(includes []string) error

SetIncludes sets and checks the inclusion regular expressions.

func (*Scraper) Start

func (s *Scraper) Start() error

Start starts the scraping

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL