ef

package module
v0.0.0-...-9622e46 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2021 License: BSD-3-Clause Imports: 2 Imported by: 0

README

Dictionary with Elias-Fano compression

Build Status Go Reference Go Report Card codecov

The static Elias-Fano representation encodes a sequence of positive, monotonically increasing integers. The representation is quasi-succinct, meaning it is less than half a bit per element away from the 0-th order empirical entropy. It supports random access of an integer in O(1) worst-case time, despite the compression. However, for random access to be fast in practice, it requires the integration of a Select index. The representation also facilitates efficient search capabilities, such as NextGEQ() and NextLEQ().

The Elias-Fano representation is built into a dictionary. Its purpose is to keep large volumes of integers in RAM in a compressed format, while still being efficient to use. The fastest access method is through iteration, but direct access is fast as well.

This is an implementation of a static Elias-Fano representation. So, stored values cannot be updated. If this is essential to your use case, please have a look at [2].

The implementation is to a large extent based on the paper of S. Vigna [1]. Forward and skip pointers are not implemented. This implementation relies on a fast Select index instead. Storage of unsorted and negative integers is on the roadmap.

The API is NOT stable yet. You can send feedback about the current API by reporting an issue.

Installation

Install ef with go get:

$ go get -u github.com/gevg/ef

Usage

Construct the dictionary, while initializing it with a number array, and iterate over the values.

// Initialize a slice of monotonically non-decreasing numbers and construct the compressed dictionary.
values := []uint{1, 3, 5, 7, 9, 11, 11, 13, 15, 17}
dict, err := From(values)

When not all values are available at construction, or when the construction of a huge values slice is not desirable, one can construct a dictionary using New() and add each entry individually with Append(). The dictionary is ready to use after each Append. There is no need to completely fill the dictionary. However, a partially filled dictionary will show inferior compression.

// Construct the dictionary with New(). Supply the number of values and
// the maximum value as configuration parameters. Add the entries.
nValues := len(values)
maxValue := values[nValues-1]

dict, err := New(nValues, maxValue)
for _, v := range values {
    if err := dict.Append(v); err != nil {
        ...
    }
}
// Cap() returns the capacity (max. number of entries) of the dictionary.
// Len() returns the current number of input values in the dictionary.
cap := Cap(dict)
length := Len(dict)

There are several ways to read back data. One can directly read one or more values, starting at a given index.

// Read the value at index 2.
v, ok := dict.Value(2)

// Read 5 values starting at index start := 4.
vals := make([]uint, 5)
vals, ok := v.RangeValues(start, vals)

// Read all values from the dictionary.
for k, v := range dict.Values() {
    ...
}

Alternatively, one can use an iterator. This allows for concurrent access.

// Create an iterator.
it := dict.Iterator()
vals := make([]uint, 0, m)

// Retrieve the first m values of the dictionary.
for i := 0; i < m; i++ {
    v, _ := it.Next()
    vals = append(vals, v)
}

The iterator does not have to start reading at index 0. A read with it.Value(k) returns the k-th value and sets the iterator state on index k+1. Subsequent reads with Next() allows to read the other entries.

// Retrieval of the values in index range [k, l). Read the k-th value and add it to the vals slice.
vals := make([]uint, 0, l-k)
v, ok := it.Value(k)
if !ok {
    ...
}
vals = append(vals, v)

// Read the other values by iterating with Next().
for i:=1; i < k-l; i++ {
    if v, ok = it.Next(); !ok {
        ...
    }
    vals = append(vals, v)
}

Benchmarks

TODO

To Do

Implementation of the following features:

  • Integration of a fast Select() algorithm
  • Support of int values
  • Support of strict monotonicity.
  • Support of non-monotonic series
  • Support of search: NextLEQ() and NextGEQ()
  • Benchmarks and further performance improvements

License

ef is available under the BSD 3-Clause License.

References

[1] S. Vigna: Quasi-succinct Indices, WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining, 2013, Pages 83–92

[2] G.E. Pibiri, R. Venturini: Dynamic Elias-Fano Representation, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), Warsaw, Poland, 2017

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Cap

func Cap(d *Dict) int

Cap returns the maximum number of entries in the dictionary.

func Len

func Len(d *Dict) int

Len returns the current number of entries in the dictionary. When the dictionary is built with From, Len and Cap always return the same result.

Types

type Dict

type Dict struct {
	// contains filtered or unexported fields
}

Dict is a dictionary, built upon a bit sequence, compressed with the Elias-Fano compression algorithm.

func From

func From(values []uint) (*Dict, error)

From constructs a dictionary from a list of values.

func New

func New(cap int, maxValue uint) (Dict, error)

New constructs a dictionary with a capacity of n entries. The dictionary can accept positive numbers, smaller than or equal to maxValue. The values needed to be in monotonic, non-decreasing way.

func (*Dict) Append

func (d *Dict) Append(value uint) int

Append appends a value to the dictionary and returns the key. If something goes wrong, Append returns -1. In that case, no value is added.

func (*Dict) Iterator

func (d *Dict) Iterator() Iterator

Iterator returns an iterator that iterates over the values in the dictionary. Iteration should not start before the static dictionary has been built. Iterators are required for concurrent access.

func (*Dict) NextGEQ

func (d *Dict) NextGEQ(value uint) (i int, next uint)

NextGEQ returns the first value in the dictionary equal to or larger than the given value. NextGEQ fails when there exists no value larger than or equal to value. If so, it returns -1 as index.

func (*Dict) NextLEQ

func (d *Dict) NextLEQ(value uint) (i int, previous uint)

NextLEQ ...

func (*Dict) RangeValues

func (d *Dict) RangeValues(start, end int) []uint

RangeValues returns a contiguous range of values, from index start to end non-inclusive.

func (*Dict) Value

func (d *Dict) Value(k int) (uint, bool)

Value returns the k-th value in the dictionary.

func (*Dict) Value2

func (d *Dict) Value2(k int) (uint, bool)

Value2 returns the k-th value in the dictionary.

func (*Dict) Values

func (d *Dict) Values() []uint

Values reads all numbers from the dictionary.

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator enables concurrent iteration over the dictionary.

func (*Iterator) Next

func (it *Iterator) Next() (uint, bool)

Next returns the next iteration value. If the iterator is not initialized through a call to Value, the first call to Next will return the first value in the dictionary.

func (*Iterator) Value

func (it *Iterator) Value(k int) (uint, bool)

Value returns the value at index k in the dictionary. It also sets the iterator state. Subsequent calls to Next will return the values with index k+1, k+2, ..., n-1. Iteration beyond n-1 will return 0, false.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL