scraper

package module
v0.0.0-...-f553d4a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2016 License: MIT Imports: 10 Imported by: 0

README

scraper

A simple, lightweight, css based web scraper written in Go

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type FileGetter

type FileGetter map[string]string

FileGetter is a Getter that retrieves file data by specified url key.

func (FileGetter) Get

func (c FileGetter) Get(url string, srcURL string) (io.ReadCloser, error)

Get looks up the file data for the specifed url

type Getter

type Getter interface {
	Get(url string, srcURL string) (io.ReadCloser, error)
}

Getter is HTTP get abstraction to enable a Scraper's data source to be changed

func HTTPGetter

func HTTPGetter() Getter

HTTPGetter create a Getter that retrieves urls over http.

type Logger

type Logger interface {
	Printf(format string, v ...interface{})
}

Logger defines a simple interface for logging (compatible with std Log package)

type MemoryGetter

type MemoryGetter map[string]string

MemoryGetter is a Getter that retrieves strings by specified url key.

func (MemoryGetter) Get

func (c MemoryGetter) Get(url string, srcURL string) (io.ReadCloser, error)

Get looks up the string data for the specifed url

type Scraper

type Scraper interface {
	Filter(selector string) Scraper
	Select(selector Sel) Scraper
	Follow(selector string) Scraper
	Done() ([]map[string]string, error)
}

Scraper defines a simple web scraper's functionality

func Get

func Get(url string) Scraper

Get creates a new scraper by retrieving the HTML at the given URL

Example
results, err :=
	Get("https://golang.org/").
		Select(Sel{
			"title": "#heading-wide a",
		}).
		Follow(".read a[href]").
		Select(Sel{
			"blogTitle": "h1 a",
		}).
		Done()

if err != nil {
	panic(err)
}

fmt.Println(len(results))
fmt.Println(results[0]["title"])
fmt.Println(results[0]["blogTitle"])
/// Output:
// 2
// The Go Programming Language
// The Go Blog
Output:

func New

func New(url string, logger Logger, getter Getter) Scraper

New creates a new scraper by using the data provided by the specified Getter

type Sel

type Sel map[string]string

Sel (Selector) is a simple key-value map of prop names to values based on a css selector

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL