cuvva

package module
v0.0.0-...-fbfe835 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 20, 2021 License: Apache-2.0 Imports: 9 Imported by: 0

README

cuvva

This is a Golang package that implements a simple web crawler.

The package is developed by Andrey Porubov (mailto:aporubov@gmail.com).

This is the solution to the coding challenge and it should not be used for any purposes other than educational.

Please note that I don't have vast production experience with Golang, it's not my primary language at the moment and I don't use Golang on every day basis.

Pre-requisites

This package relies on x/net/html package for parsing HTML which isn't included in the standard Golang distribution, thus it should be fetched first:

go get golang.org/x/net/html

Usage

import (
    "net/url"
    "github.com/aporubov/cuvva"
)

Construct the crawler providing the entry URL, then invoke the Crawl method to perform the crawling and Print method to pretty-print the generated sitemap, for example:

URL, _ := url.Parse("https://www.cuvva.com")
crawler := cuvva.NewCrawler()
crawler.Crawl(URL)
crawler.Print()

The example above should produce the output similar to

...
(webpage) https://www.cuvva.com/support/cuvva-privacy-policy
    Links
        (webpage) https://www.cuvva.com/how-insurance-works
        (webpage) https://www.cuvva.com/support
        (webpage) https://www.cuvva.com/support/cuvva-privacy-policy
        (webpage) https://www.cuvva.com/about
        (webpage) https://www.cuvva.com/
        (webpage) https://www.cuvva.com/car-insurance/subscription
        (webpage) https://www.cuvva.com/get-an-estimate
        (webpage) https://www.cuvva.com/car-insurance/temporary
        (webpage) https://www.cuvva.com/car-insurance/temporary-van-insurance
        (webpage) https://www.cuvva.com/car-insurance/learner-driver
        (webpage) https://www.cuvva.com/free-car-checker
        (webpage) https://www.cuvva.com/insurance-groups
        (webpage) https://www.cuvva.com/news
        (webpage) https://www.cuvva.com/careers
        (webpage) https://www.cuvva.com/support/cuvva-cookie-policy
        (webpage) https://www.cuvva.com/support/cuvvas-terms-conditions
        (webpage) https://www.cuvva.com/support/contacting-support
        (webpage) https://www.cuvva.com/support/how-is-cuvva-regulated
        (webpage) https://www.cuvva.com/category/data-security
    Assets
        (image/vnd.microsoft.icon) https://www.cuvva.com/favicon.98576fa3.ico
        (application/javascript; charset=utf-8) https://www.cuvva.com/website.11813109.js
(webpage) https://www.cuvva.com/cuvva/how-do-our-product-teams-work-with-cops
    Links
        (webpage) https://www.cuvva.com/
        (webpage) https://www.cuvva.com/support
        (webpage) https://www.cuvva.com/how-insurance-works
        (webpage) https://www.cuvva.com/about
        (webpage) https://www.cuvva.com/get-an-estimate
        (webpage) https://www.cuvva.com/car-insurance/temporary
        (webpage) https://www.cuvva.com/car-insurance/subscription
        (webpage) https://www.cuvva.com/cuvva/1-minute-response-time-24-7-365-days-a-year
        (webpage) https://www.cuvva.com/cuvva/how-we-test-and-roll-out-new-product-features
        (webpage) https://www.cuvva.com/careers
        (webpage) https://www.cuvva.com
        (webpage) https://www.cuvva.com/car-insurance/learner-driver
        (webpage) https://www.cuvva.com/car-insurance/temporary-van-insurance
        (webpage) https://www.cuvva.com/free-car-checker
        (webpage) https://www.cuvva.com/insurance-groups
        (webpage) https://www.cuvva.com/news
        (webpage) https://www.cuvva.com/support/contacting-support
        (webpage) https://www.cuvva.com/support/cuvva-cookie-policy
        (webpage) https://www.cuvva.com/support/cuvva-privacy-policy
        (webpage) https://www.cuvva.com/support/cuvvas-terms-conditions
    Assets
        (application/javascript; charset=utf-8) https://www.cuvva.com/website.11813109.js
        (image/vnd.microsoft.icon) https://www.cuvva.com/favicon.98576fa3.ico
...

Please note that the crawler is limited to one domain from the entry point URL, so when crawling cuvva.com, it would crawl all pages within the cuvva.com domain, but not follow the links to other websites (e.g. Twitter/Facebook accounts).

Tests

There is only one basic test at the moment that constructs and runs the crawler for the cuvva.com domain, exactly as it's described in Usage section.

go test

TODO: Add more tests.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	URL *url.URL
	// contains filtered or unexported fields
}

Crawler struct represents main crawling context

func NewCrawler

func NewCrawler() *Crawler

NewCrawler instantiates a new Crawler

func (*Crawler) Crawl

func (c *Crawler) Crawl(URL *url.URL)

Crawl performs the crawling

func (*Crawler) Print

func (c *Crawler) Print()

Print prints generated sitemap

type Resource

type Resource struct {
	URL          *url.URL
	References   []*Resource
	ContentType  string
	StatusCode   int
	LastModified string
	Error        error
}

Resource struct represents any web resource

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL