scraper

package

v0.1.0 Latest Latest Go to latest Published: Jan 12, 2020 License: MIT Imports: 23 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/cornelk/goscrape

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func GetPageFilePath(url *url.URL) string
type Scraper
- func New(logger *zap.Logger, startURL string) (*Scraper, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// PageExtension is the file extension that downloaded pages get
	PageExtension = ".html"
	// PageDirIndex is the file name of the index file for every dir
	PageDirIndex = "index" + PageExtension
)

Functions ¶

func GetPageFilePath ¶

func GetPageFilePath(url *url.URL) string

GetPageFilePath returns a filename for a URL that represents a page.

Types ¶

type Scraper ¶

type Scraper struct {
	// Configuration
	ImageQuality    uint
	MaxDepth        uint
	OutputDirectory string
	Username        string
	Password        string

	URL *url.URL
	// contains filtered or unexported fields
}

Scraper contains all scraping data

func New ¶

func New(logger *zap.Logger, startURL string) (*Scraper, error)

New creates a new Scraper instance

func (*Scraper) GetFilePath ¶

func (s *Scraper) GetFilePath(url *url.URL, isAPage bool) string

GetFilePath returns a file path for a URL to store the URL content in

func (*Scraper) RemoveAnchor ¶

func (s *Scraper) RemoveAnchor(path string) string

RemoveAnchor removes anchors from URLS

func (*Scraper) SetExcludes ¶

func (s *Scraper) SetExcludes(excludes []string) error

SetExcludes sets and checks the exclusions regular expressions.

func (*Scraper) SetIncludes ¶

func (s *Scraper) SetIncludes(includes []string) error

SetIncludes sets and checks the inclusion regular expressions.

func (*Scraper) Start ¶

func (s *Scraper) Start() error

Start starts the scraping

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL