walk

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 25, 2023 License: BSD-3-Clause Imports: 13 Imported by: 0

Documentation

Overview

Package walk provides interfaces and methods for walking Library of Congress (LoC) data files.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func RegisterWalker

func RegisterWalker(ctx context.Context, scheme string, f WalkerInitializeFunc) error

RegisterWalker() associates 'scheme' with 'init_func' in an internal list of avilable `Walker` implementations.

func Schemes

func Schemes() []string

Schemes() returns the list of schemes that have been "registered".

Types

type LocalWalkReader added in v0.2.0

type LocalWalkReader struct {
	WalkReader
	// contains filtered or unexported fields
}

type LocalWalkReader implements the `WalkReader` interface for files on a local disk.

func (*LocalWalkReader) Close added in v0.2.0

func (r *LocalWalkReader) Close() error

Close closes the underlying `os.File` instance for 'r'.

func (*LocalWalkReader) Read added in v0.2.0

func (r *LocalWalkReader) Read(p []byte) (int, error)

Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered. Even if Read returns n < len(p), it may use all of p as scratch space during the call. If some data is available but not len(p) bytes, Read conventionally returns what is available instead of waiting for more.

func (*LocalWalkReader) ReadAt added in v0.2.0

func (r *LocalWalkReader) ReadAt(p []byte, off int64) (int, error)

ReadAt reads len(buf) bytes into buf starting at offset off.

type NDJSONWalker

type NDJSONWalker struct {
	Walker
	// contains filtered or unexported fields
}

type NDJSONWalker implements the `Walker` interface for NDJSON files.

func (*NDJSONWalker) WalkFile

func (w *NDJSONWalker) WalkFile(ctx context.Context, cb WalkCallbackFunction, uri string) error

WalkFile() processes 'uri' dispatch each record to 'cb'.

func (*NDJSONWalker) WalkReader

func (w *NDJSONWalker) WalkReader(ctx context.Context, cb WalkCallbackFunction, r io.Reader) error

WalkReader() processes each record in 'r' (which is expected to a line-separate JSON document) and dispatches each record to 'cb'.

func (*NDJSONWalker) WalkURIs

func (w *NDJSONWalker) WalkURIs(ctx context.Context, cb WalkCallbackFunction, uris ...string) error

WalkURIs() processes 'uris' dispatching each record to 'cb'. 'uris' is expected to be a list of compressed ('.zip') or uncompressed files on disk.

func (*NDJSONWalker) WalkZipFile

func (w *NDJSONWalker) WalkZipFile(ctx context.Context, cb WalkCallbackFunction, uri string) error

WalkZipFile() decompresses 'uri' and processes each file (contained in the zip archive) dispatching each record to 'cb'.

type RemoteWalkReader added in v0.2.0

type RemoteWalkReader struct {
	WalkReader
	// contains filtered or unexported fields
}

type RemoteWalkReader implements the `WalkReader` interface for files on a remote web server.

func (*RemoteWalkReader) Close added in v0.2.0

func (r *RemoteWalkReader) Close() error

Close is a no-op.

func (*RemoteWalkReader) Read added in v0.2.0

func (r *RemoteWalkReader) Read(p []byte) (int, error)

Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered. Even if Read returns n < len(p), it may use all of p as scratch space during the call. If some data is available but not len(p) bytes, Read conventionally returns what is available instead of waiting for more.

func (*RemoteWalkReader) ReadAt added in v0.2.0

func (r *RemoteWalkReader) ReadAt(p []byte, off int64) (int, error)

ReadAt reads len(buf) bytes into buf starting at offset off.

type WalkCallbackFunction

type WalkCallbackFunction func(context.Context, []byte) error

type WalkCallbackFunction defines a user-specified callback function for processing a LoC data file.

type WalkReader added in v0.2.0

type WalkReader interface {
	// Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered. Even if Read returns n < len(p), it may use all of p as scratch space during the call. If some data is available but not len(p) bytes, Read conventionally returns what is available instead of waiting for more.
	Read(p []byte) (int, error)
	// ReadAt reads len(buf) bytes into buf starting at offset off.
	ReadAt([]byte, int64) (int, error)
	// Close closes any underlying file handles. It is implementation specific.
	Close() error
}

WalkReader is an interface which implements the `io.Reader`, `io.ReaderAt` and `io.Closer` interface for reading Library of Congress data files. This provides a common interface for reading local and remote data files regardless of whether or not they are compressed.

func OpenURI added in v0.2.0

func OpenURI(ctx context.Context, uri string) (WalkReader, int64, error)

OpenURI opens 'uri' a and returns an `io.Reader` and the size of the file. If 'uri' is prefixed with "https://" then the body of the file will be retrieved via an HTTP GET request.

type Walker

type Walker interface {
	// WalkURIs iterates (walks) LoC data files from one or more URIs.
	WalkURIs(context.Context, WalkCallbackFunction, ...string) error
	// WalkFile iterates (walks) a LoC data file on disk.
	WalkFile(context.Context, WalkCallbackFunction, string) error
	// WalkZipFile iterates (walks) a LoC zip-compressed data file on disk.
	WalkZipFile(context.Context, WalkCallbackFunction, string) error
	// WalkZipFile iterates (walks) LoC data from an `io.Reader` instance.
	WalkReader(context.Context, WalkCallbackFunction, io.Reader) error
}

type Walker defines an interface for iterating (walking) LoC data files from a variety or sources.

func NewNDJSONWalker

func NewNDJSONWalker(ctx context.Context, uri string) (Walker, error)

NewNDJSONWalker creates a new instance that implements the `Walker` interface for NDJSON files configured by 'uri' which is expected to take the form of:

ndjson://?{PARAMETERS}

Where {PARAMETERS} may be: * `?workers=` The number of maximum simultaneous workers for processing NDJSON records. Default is 100.

func NewWalker

func NewWalker(ctx context.Context, uri string) (Walker, error)

NewWalker() returns a new `Walker` instance derived from 'uri'. The semantics of and requirements for 'uri' as specific to the package implementing the interface.

type WalkerInitializeFunc

type WalkerInitializeFunc func(ctx context.Context, uri string) (Walker, error)

type WalkerInitializeFunc is a function used to initialize an implementation of the `Walker` interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL