bsearch

package module

v0.2.2 Latest Latest Go to latest Published: Feb 21, 2022 License: MIT Imports: 15 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ProfoundNetworks/bsearch

Links

Open Source Insights

README ¶

bsearch

bsearch is a go library providing binary search functionality for line-ordered byte streams (such as LC_ALL=C sorted text files). It allows very fast lookups based on line prefixes, like the venerable look(1) unix utility.

bsearch can also be used via a DB object with a simple db-like interface, allowing sorted CSV files to be used as a key-value store, with excellent performance.

bsearch currently only supports bytewise key comparisons (not UTF-8 collations). This restriction may be removed in the future.

Usage

    import "github.com/ProfoundNetworks/bsearch"

    // Instantiate searcher from a file
    bss, err := bsearch.NewSearcherFile(filepath)
    defer bss.Close()

    // Or instantiate searcher from an io.ReaderAt
    bss := bsearch.NewSearcher(reader, datalen)

    // Find first line beginning with searchStr
    line, err := bss.Line([]byte(searchStr))

    // Find position of first line beginning with searchStr
    pos, err := bss.LinePosition([]byte(searchStr))

    // Distinguish not found from other errors
    if err != nil && err == bsearch.ErrNotFound {
        // do something
    } else if err != nil {
        log.Fatal(err)
    }


    // Or via the `DB` interface
    db, err := bsearch.NewDB(filepath)
    defer db.Close()

    // Lookup values via byteslices or strings
    valbytes := db.Get(keybytes)
    valstr := db.GetString(keystring)

Status

bsearch is alpha software. Interfaces may break and change until an official version 1.0.0 is released.

Copyright and Licence

This project is licensed under the terms of the MIT license.

Documentation ¶

Index ¶

Variables
func IndexPath(path string) (string, error)
type DB
- func NewDB(path string) (*DB, error)
type Index
- func (i *Index) Write() error
type IndexEntry
type IndexOptions
type Searcher
- func NewSearcher(path string) (*Searcher, error)
- func NewSearcherOptions(path string, opt SearcherOptions) (*Searcher, error)
type SearcherOptions

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrIndexNotFound     = errors.New("index file not found")
	ErrIndexExpired      = errors.New("index file out of date")
	ErrIndexEmpty        = errors.New("index contains no entries")
	ErrIndexPathMismatch = errors.New("index file path mismatch")
)

View Source

var (
	ErrFileNotFound        = errors.New("filepath not found")
	ErrNotFile             = errors.New("filepath exists but is not a file")
	ErrFileCompressed      = errors.New("filepath exists but is compressed")
	ErrNotFound            = errors.New("key not found")
	ErrKeyExceedsBlocksize = errors.New("key length exceeds blocksize")
	ErrUnknownDelimiter    = errors.New("cannot guess delimiter from filename")
)

Functions ¶

func IndexPath ¶

func IndexPath(path string) (string, error)

IndexPath returns the filepath of the index assocated with path

Types ¶

type DB ¶

type DB struct {
	// contains filtered or unexported fields
}

DB provides a simple key-value-store-like interface using bsearch.Searcher, returning the first value from path for a given key (if you need more control you're encouraged to use bsearch.Searcher directly).

func NewDB ¶

func NewDB(path string) (*DB, error)

NewDB returns a new DB for the file at path. The caller is responsible for calling DB.Close() when finished (e.g. via defer).

func (*DB) Close ¶

func (db *DB) Close()

Close closes our Searcher's underlying reader (if applicable)

func (*DB) Get ¶

func (db *DB) Get(key []byte) ([]byte, error)

Get returns the (first) value associated with key in db (or ErrNotFound if missing)

func (*DB) GetString ¶

func (db *DB) GetString(key string) (string, error)

GetString returns the (first) value associated with key in db, as a string (or ErrNotFound if missing)

type Index ¶

type Index struct {
	Blocksize      int          `yaml:"blocksize"`
	Delimiter      []byte       `yaml:"delim"`
	Epoch          int64        `yaml:"epoch"`
	Filepath       string       `yaml:"filepath"`
	Header         bool         `yaml:"header"`
	KeysIndexFirst bool         `yaml:"keys_index_first"`
	KeysUnique     bool         `yaml:"keys_unique"`
	Length         int          `yaml:"length"`
	List           []IndexEntry `yaml:"list"`
	Version        int          `yaml:"version"`
	// contains filtered or unexported fields
}

Index provides index metadata for the Filepath dataset

func LoadIndex ¶

func LoadIndex(path string) (*Index, error)

LoadIndex loads Index from the associated index file for path. Returns ErrIndexNotFound if no index file exists. Returns ErrIndexExpired if path is newer than the index file. Returns ErrIndexPathMismatch if index filepath does not equal path.

func NewIndex ¶

func NewIndex(path string) (*Index, error)

NewIndex creates a new Index for the path dataset

func NewIndexOptions ¶

func NewIndexOptions(path string, opt IndexOptions) (*Index, error)

NewIndexOptions creates a new Index for path with delim as the delimiter

func (*Index) Write ¶

func (i *Index) Write() error

Write writes the index to disk

type IndexEntry ¶

type IndexEntry struct {
	Key    string `yaml:"k"`
	Offset int64  `yaml:"o"` // file offset for start-of-block
}

type IndexOptions ¶

type IndexOptions struct {
	Blocksize int
	Delimiter []byte
	Header    bool
	Logger    *zerolog.Logger // debug logger
}

type Searcher ¶

type Searcher struct {
	Index *Index // bsearch index
	// contains filtered or unexported fields
}

Searcher provides binary search functionality on byte-ordered CSV-style delimited text files.

func NewSearcher ¶

func NewSearcher(path string) (*Searcher, error)

NewSearcher returns a new Searcher for path using default options. The caller is responsible for calling *Searcher.Close() when finished.

func NewSearcherOptions ¶

func NewSearcherOptions(path string, opt SearcherOptions) (*Searcher, error)

NewSearcherOptions returns a new Searcher for path using opt. The caller is responsible for calling *Searcher.Close() when finished.

func (*Searcher) Close ¶

func (s *Searcher) Close()

Close closes the searcher's reader (if applicable)

func (*Searcher) Line ¶

func (s *Searcher) Line(key []byte) ([]byte, error)

Line returns the first line in the reader that begins with key, using a binary search (data must be bytewise-ordered).

func (*Searcher) Lines ¶

func (s *Searcher) Lines(b []byte) ([][]byte, error)

Lines returns all lines in the reader that begin with the byte slice b, using a binary search (data must be bytewise-ordered).

func (*Searcher) LinesN ¶

func (s *Searcher) LinesN(key []byte, n int) ([][]byte, error)

LinesN returns the first n lines in the reader that begin with key, using a binary search (data must be bytewise-ordered).

type SearcherOptions ¶

type SearcherOptions struct {
	MatchLE bool            // use less-than-or-equal-to match semantics
	Logger  *zerolog.Logger // debug logger
	// Index options (used to check index or build new one)
	Delimiter []byte // delimiter separating fields in dataset
	Header    bool   // first line of dataset is header and should be ignored
}

SearcherOptions struct for use with NewSearcherOptions

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
bsearch
bsearch_index
bsearch_selftest

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL