crawler

package

v0.0.0-...-3f03cec Latest Latest Go to latest Published: Apr 23, 2024 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/adrianos93/crawler

Links

Open Source Insights

Documentation ¶

Index ¶

type Crawler
- func New(logger *slog.Logger, reader Reader, maxWorkers int) *Crawler
- func (c *Crawler) Start(ctx context.Context, root *url.URL) []Page
type Page
type Reader

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Crawler ¶

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler is used to crawl webpages and populate a linked list with the extracted information

func New ¶

func New(logger *slog.Logger, reader Reader, maxWorkers int) *Crawler

New returns a Crawler

func (*Crawler) Start ¶

func (c *Crawler) Start(ctx context.Context, root *url.URL) []Page

Start is used to initiate the crawling process. It takes a root URL and will trigger a couple of goroutines to crawl the extracted data. It will create a buffered channel for the links, with a max buffer provided to the crawler to ensure concurrency is not unbounded. Once all the data is extracted it will loop over the linked list on the Crawler and populate and return slice of Page

type Page ¶

type Page struct {
	URL   string
	Links []string
}

Page is a type used to hold the information extracted from a given page

type Reader ¶

type Reader interface {
	ReadPage(ctx context.Context, location string, data io.Writer) error
}

Reader is an interface so that Crawler can read the data provided from the given source

Source Files ¶

View all Source files

crawler.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL