crawler

package
v0.13.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2022 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Overview

Package crawler implements a STAC resource crawler.

Index

Constants

This section is empty.

Variables

View Source
var DefaultOptions = &Options{
	Context:      context.Background(),
	Recursion:    Children,
	Concurrency:  runtime.GOMAXPROCS(0),
	ErrorHandler: func(err error) error { return err },
}

DefaultOptions used when creating a new crawler.

Functions

func Crawl added in v0.11.0

func Crawl(resource string, visitor Visitor, options ...*Options) error

Crawl calls the visitor for each resolved resource.

The resource can be a file path or a URL. Any error returned by visitor will stop crawling and be returned by this function. Context cancellation will also stop crawling and the context error will be returned.

func Join added in v0.11.0

func Join(resource string, visitor Visitor, options ...*Options) error

Join adds a crawler to an existing crawl instead of initiating a new one.

This is useful when two crawlers share the same queue and the first crawler has been taken down or the queue is getting too large. Join will return immediately if the queue of tasks is empty.

func LinkTypeAnyJSON added in v0.10.0

func LinkTypeAnyJSON(link Link) bool

func LinkTypeApplicationJSON added in v0.10.0

func LinkTypeApplicationJSON(link Link) bool

func LinkTypeGeoJSON added in v0.10.0

func LinkTypeGeoJSON(link Link) bool

func LinkTypeNone added in v0.10.0

func LinkTypeNone(link Link) bool

Types

type Asset added in v0.12.0

type Asset map[string]interface{}

Asset provides metadata about data for an item.

func (Asset) Description added in v0.12.0

func (a Asset) Description() string

Description returns the asset's description.

func (Asset) Href added in v0.12.0

func (a Asset) Href() string

Href returns the asset's href.

func (Asset) Roles added in v0.12.0

func (a Asset) Roles() []string

Roles returns the asset's description.

func (Asset) Title added in v0.12.0

func (a Asset) Title() string

Title returns the asset's title.

func (Asset) Type added in v0.12.0

func (a Asset) Type() string

Type returns the asset's type.

type ErrorHandler added in v0.11.0

type ErrorHandler func(error) error

ErrorHandler is called with any errors during a crawl. If the function returns nil, the crawl will continue. If the function returns an error, the crawl will stop.

type Link map[string]string

Link represents a link to a resource.

type LinkMatcher added in v0.10.0

type LinkMatcher func(link Link) bool
type Links []Link

Links is a slice of links.

func (Links) Rel added in v0.10.0

func (links Links) Rel(rel string, matchers ...LinkMatcher) Link

type Options

type Options struct {
	// Optional context.  If provided, the crawler will stop when the context is done.
	Context context.Context

	// Limit to the number of resources to fetch and visit concurrently.
	Concurrency int

	// Strategy to use when crawling linked resources.  Use None to visit
	// a single resource.  Use Children to only visit linked item/child resources.
	Recursion RecursionType

	// Optional function to limit which resources to crawl.  If provided, the function
	// will be called with the URL or absolute path to a resource before it is crawled.
	// If the function returns false, the resource will not be read and the visitor will
	// not be called.
	Filter func(string) bool

	// Optional function to handle any errors during the crawl.  By default, any error
	// will stop the crawl.  To continue crawling on error, provide a function that
	// returns nil.
	ErrorHandler ErrorHandler

	// Optional queue to use for crawling tasks.  If not provided, an in-memory queue
	// will be used.  When running a crawl across multiple processes, it can be useful
	// to provide a queue that is shared across processes.
	Queue workgroup.Queue[*Task]
}

Options for creating a crawler.

type RecursionType

type RecursionType string

RecursionType informs the crawler how to treat linked resources. None will only call the visitor for the first resource. Children will call the visitor for all child catalogs, collections, and items. All will call the visitor for parent resources as well as child resources.

const (
	None     RecursionType = "none"
	Children RecursionType = "children"
)

type Resource

type Resource map[string]interface{}

Resource represents a STAC catalog, collection, or item.

func (Resource) Assets added in v0.12.0

func (r Resource) Assets() map[string]Asset

Returns the assets (if any).

func (Resource) ConformsTo added in v0.6.0

func (r Resource) ConformsTo() []string

Returns the STAC / OGC Features API conformance classes (if any).

func (Resource) Extensions

func (r Resource) Extensions() []string

Extensions returns the resource extension URLs.

func (r Resource) Links() Links

Links returns the resource links.

func (Resource) Type

func (r Resource) Type() ResourceType

Type returns the specific resource type.

func (Resource) Version

func (r Resource) Version() string

Version returns the STAC version.

type ResourceType

type ResourceType string

ResourceType indicates the STAC resource type.

const (
	Item       ResourceType = "item"
	Catalog    ResourceType = "catalog"
	Collection ResourceType = "collection"
)

type Task added in v0.9.0

type Task struct {
	Url  string
	Type string
}

type Visitor

type Visitor func(string, Resource) error

Visitor is called for each resource during crawling.

The resource location (URL or file path) is passed as the first argument. Any returned error will stop crawling and be returned by Crawl.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL