crawler

package
v0.0.0-...-3f03cec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2024 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler is used to crawl webpages and populate a linked list with the extracted information

func New

func New(logger *slog.Logger, reader Reader, maxWorkers int) *Crawler

New returns a Crawler

func (*Crawler) Start

func (c *Crawler) Start(ctx context.Context, root *url.URL) []Page

Start is used to initiate the crawling process. It takes a root URL and will trigger a couple of goroutines to crawl the extracted data. It will create a buffered channel for the links, with a max buffer provided to the crawler to ensure concurrency is not unbounded. Once all the data is extracted it will loop over the linked list on the Crawler and populate and return slice of Page

type Page

type Page struct {
	URL   string
	Links []string
}

Page is a type used to hold the information extracted from a given page

type Reader

type Reader interface {
	ReadPage(ctx context.Context, location string, data io.Writer) error
}

Reader is an interface so that Crawler can read the data provided from the given source

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL