aim

package module
v1.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 30, 2021 License: Apache-2.0 Imports: 8 Imported by: 0

README

🔍 Aim - SEO Meta Crawler

Elegant Scraper with sensible defaults for crawling websites based on SEO Metrics.

Aim provides an interface for crawling websites based on Colly, the lightning fast scraping framework for Gophers.

Features

🏃  Fast — uses Colly under the hood.

🦄  Easy to use — sensible defaults for most use cases.

🛠  Extendable — hooks are used to make crawled data available anywhere.

👀  Takes a close look — You can find a list of all the things aim can find here

Installation

Add aim to your go.mod file:

module github.com/x/y

go 1.17

require (
        github.com/tim-richter/aim latest
)

Example

func main() {
    crawl := aim.NewCrawl(
        "https://timrichter.dev",
        aim.MaxDepth(3),
        aim.Limit(aim.LimitRule{DomainGlob: "*", Parallelism: 2}))
    
    crawl.OnResponse = func(response aim.CrawlerReponse) {
        fmt.Printf("%+v\n", response)
    }
    
    crawl.Start()
}

Bugs

Bugs or suggestions? Visit the issue tracker

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetH2 added in v1.3.0

func GetH2(h2s []string) (string, string)
func GetInlinks(domain string, links []string) []string
func GetOutlinks(domain string, links []string) []string

func WordCount added in v1.3.0

func WordCount(value string) int

Types

type Crawler

type Crawler struct {
	Domain         string
	UserAgent      string
	MaxDepth       int
	UseSitemap     bool
	AllowedDomains []string
	Limits         []colly.LimitRule
	CacheDir       string
	OnResponse     func(response CrawlerResponse)
	OnImage        func(image CrawlerResponseImage)
}

func NewCrawl

func NewCrawl(domain string, options ...CrawlerOption) *Crawler

func (*Crawler) Init

func (crawler *Crawler) Init()

func (*Crawler) Start

func (crawler *Crawler) Start() *Crawler

type CrawlerOption

type CrawlerOption func(*Crawler)

func AllowedDomains

func AllowedDomains(domains ...string) CrawlerOption

func Limit

func Limit(rule LimitRule) CrawlerOption

func MaxDepth

func MaxDepth(depth int) CrawlerOption

func UseSitemap

func UseSitemap(use bool) CrawlerOption

func UserAgent

func UserAgent(ua string) CrawlerOption

type CrawlerResponse

type CrawlerResponse struct {
	URL                   string
	ContentType           string
	StatusCode            int
	Status                string
	H1                    string
	H1Length              int
	H2One                 string
	H2OneLength           int
	H2Two                 string
	H2TwoLength           int
	MetaDescription       string
	MetaDescriptionLength int
	Size                  int
	WordCount             int
	CrawlDepth            int
	Inlinks               []string
	InlinksCount          int
	Outlinks              []string
	OutlinksCount         int

	Canonicals []string

	Amp    CrawlerResponseAmp
	Images []CrawlerResponseImage
}

type CrawlerResponseAmp added in v1.3.0

type CrawlerResponseAmp struct {
	IsAmp        bool
	HasAmp       bool
	AmpLink      string
	OriginalLink string
}

func GetAmpMeta added in v1.3.0

func GetAmpMeta(e *colly.HTMLElement) CrawlerResponseAmp

type CrawlerResponseImage added in v1.3.0

type CrawlerResponseImage struct {
	ParentPage string
	URL        string
	StatusCode int
	StatusText string
	Alt        string
	Size       int
}

func GetImageMeta added in v1.3.0

func GetImageMeta(e *colly.HTMLElement) []CrawlerResponseImage

type LimitRule

type LimitRule = colly.LimitRule

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL