bsearch

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 21, 2022 License: MIT Imports: 15 Imported by: 0

README

bsearch

bsearch is a go library providing binary search functionality for line-ordered byte streams (such as LC_ALL=C sorted text files). It allows very fast lookups based on line prefixes, like the venerable look(1) unix utility.

bsearch can also be used via a DB object with a simple db-like interface, allowing sorted CSV files to be used as a key-value store, with excellent performance.

bsearch currently only supports bytewise key comparisons (not UTF-8 collations). This restriction may be removed in the future.

Usage

    import "github.com/ProfoundNetworks/bsearch"

    // Instantiate searcher from a file
    bss, err := bsearch.NewSearcherFile(filepath)
    defer bss.Close()

    // Or instantiate searcher from an io.ReaderAt
    bss := bsearch.NewSearcher(reader, datalen)

    // Find first line beginning with searchStr
    line, err := bss.Line([]byte(searchStr))

    // Find position of first line beginning with searchStr
    pos, err := bss.LinePosition([]byte(searchStr))

    // Distinguish not found from other errors
    if err != nil && err == bsearch.ErrNotFound {
        // do something
    } else if err != nil {
        log.Fatal(err)
    }


    // Or via the `DB` interface
    db, err := bsearch.NewDB(filepath)
    defer db.Close()

    // Lookup values via byteslices or strings
    valbytes := db.Get(keybytes)
    valstr := db.GetString(keystring)

Status

bsearch is alpha software. Interfaces may break and change until an official version 1.0.0 is released.

Copyright 2020 Profound Networks LLC.

This project is licensed under the terms of the MIT license.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrIndexNotFound     = errors.New("index file not found")
	ErrIndexExpired      = errors.New("index file out of date")
	ErrIndexEmpty        = errors.New("index contains no entries")
	ErrIndexPathMismatch = errors.New("index file path mismatch")
)
View Source
var (
	ErrFileNotFound        = errors.New("filepath not found")
	ErrNotFile             = errors.New("filepath exists but is not a file")
	ErrFileCompressed      = errors.New("filepath exists but is compressed")
	ErrNotFound            = errors.New("key not found")
	ErrKeyExceedsBlocksize = errors.New("key length exceeds blocksize")
	ErrUnknownDelimiter    = errors.New("cannot guess delimiter from filename")
)

Functions

func IndexPath

func IndexPath(path string) (string, error)

IndexPath returns the filepath of the index assocated with path

Types

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB provides a simple key-value-store-like interface using bsearch.Searcher, returning the first value from path for a given key (if you need more control you're encouraged to use bsearch.Searcher directly).

func NewDB

func NewDB(path string) (*DB, error)

NewDB returns a new DB for the file at path. The caller is responsible for calling DB.Close() when finished (e.g. via defer).

func (*DB) Close

func (db *DB) Close()

Close closes our Searcher's underlying reader (if applicable)

func (*DB) Get

func (db *DB) Get(key []byte) ([]byte, error)

Get returns the (first) value associated with key in db (or ErrNotFound if missing)

func (*DB) GetString

func (db *DB) GetString(key string) (string, error)

GetString returns the (first) value associated with key in db, as a string (or ErrNotFound if missing)

type Index

type Index struct {
	Blocksize      int          `yaml:"blocksize"`
	Delimiter      []byte       `yaml:"delim"`
	Epoch          int64        `yaml:"epoch"`
	Filepath       string       `yaml:"filepath"`
	Header         bool         `yaml:"header"`
	KeysIndexFirst bool         `yaml:"keys_index_first"`
	KeysUnique     bool         `yaml:"keys_unique"`
	Length         int          `yaml:"length"`
	List           []IndexEntry `yaml:"list"`
	Version        int          `yaml:"version"`
	// contains filtered or unexported fields
}

Index provides index metadata for the Filepath dataset

func LoadIndex

func LoadIndex(path string) (*Index, error)

LoadIndex loads Index from the associated index file for path. Returns ErrIndexNotFound if no index file exists. Returns ErrIndexExpired if path is newer than the index file. Returns ErrIndexPathMismatch if index filepath does not equal path.

func NewIndex

func NewIndex(path string) (*Index, error)

NewIndex creates a new Index for the path dataset

func NewIndexOptions

func NewIndexOptions(path string, opt IndexOptions) (*Index, error)

NewIndexOptions creates a new Index for path with delim as the delimiter

func (*Index) Write

func (i *Index) Write() error

Write writes the index to disk

type IndexEntry

type IndexEntry struct {
	Key    string `yaml:"k"`
	Offset int64  `yaml:"o"` // file offset for start-of-block
}

type IndexOptions

type IndexOptions struct {
	Blocksize int
	Delimiter []byte
	Header    bool
	Logger    *zerolog.Logger // debug logger
}

type Searcher

type Searcher struct {
	Index *Index // bsearch index
	// contains filtered or unexported fields
}

Searcher provides binary search functionality on byte-ordered CSV-style delimited text files.

func NewSearcher

func NewSearcher(path string) (*Searcher, error)

NewSearcher returns a new Searcher for path using default options. The caller is responsible for calling *Searcher.Close() when finished.

func NewSearcherOptions

func NewSearcherOptions(path string, opt SearcherOptions) (*Searcher, error)

NewSearcherOptions returns a new Searcher for path using opt. The caller is responsible for calling *Searcher.Close() when finished.

func (*Searcher) Close

func (s *Searcher) Close()

Close closes the searcher's reader (if applicable)

func (*Searcher) Line

func (s *Searcher) Line(key []byte) ([]byte, error)

Line returns the first line in the reader that begins with key, using a binary search (data must be bytewise-ordered).

func (*Searcher) Lines

func (s *Searcher) Lines(b []byte) ([][]byte, error)

Lines returns all lines in the reader that begin with the byte slice b, using a binary search (data must be bytewise-ordered).

func (*Searcher) LinesN

func (s *Searcher) LinesN(key []byte, n int) ([][]byte, error)

LinesN returns the first n lines in the reader that begin with key, using a binary search (data must be bytewise-ordered).

type SearcherOptions

type SearcherOptions struct {
	MatchLE bool            // use less-than-or-equal-to match semantics
	Logger  *zerolog.Logger // debug logger
	// Index options (used to check index or build new one)
	Delimiter []byte // delimiter separating fields in dataset
	Header    bool   // first line of dataset is header and should be ignored
}

SearcherOptions struct for use with NewSearcherOptions

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL