arenaskl

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 17, 2024 License: Apache-2.0, Apache-2.0 Imports: 8 Imported by: 0

README

arenaskl

Fast, lock-free, arena-based Skiplist implementation in Go that supports iteration in both directions.

Since this is pebble's internal lib, matrixone copied it and modified it to support some features and make it easier to use in matrixone.

Advantages

Arenaskl offers several advantages over other skiplist implementations:

  • High performance that linearly scales with the number of cores. This is achieved by allocating from a fixed-size arena and by avoiding locks.
  • Iterators that can be allocated on the stack and easily cloned by value.
  • Simple-to-use and low overhead model for detecting and handling race conditions with other threads.
  • Support for iterating in reverse (i.e. previous links).

Limitations

The advantages come at a cost that prevents arenaskl from being a general-purpose skiplist implementation:

  • The size of the arena sets a hard upper bound on the combined size of skiplist nodes, keys, and values. This limit includes even the size of deleted nodes, keys, and values.
  • Deletion is not supported. Instead, higher-level code is expected to add deletion tombstones and needs to process those tombstones appropriately.

Pedigree

This code is based on Andy Kimball's arenaskl code:

https://github.com/andy-kimball/arenaskl

The arenaskl code is based on the skiplist found in Badger, a Go-based KV store:

https://github.com/dgraph-io/badger/tree/master/skl

The skiplist in Badger is itself based on a C++ skiplist built for Facebook's RocksDB:

https://github.com/facebook/rocksdb/tree/master/memtable

Benchmarks

The benchmarks consist of a mix of reads and writes executed in parallel. The fraction of reads is indicated in the run name: "frac_X" indicates a run where X percent of the operations are reads.

The results are much better than skiplist and slist.

name                  time/op
ReadWrite/frac_0-8     470ns ±11%
ReadWrite/frac_10-8    462ns ± 3%
ReadWrite/frac_20-8    436ns ± 2%
ReadWrite/frac_30-8    410ns ± 2%
ReadWrite/frac_40-8    385ns ± 2%
ReadWrite/frac_50-8    360ns ± 4%
ReadWrite/frac_60-8    386ns ± 1%
ReadWrite/frac_70-8    352ns ± 2%
ReadWrite/frac_80-8    306ns ± 3%
ReadWrite/frac_90-8    253ns ± 4%
ReadWrite/frac_100-8  28.1ns ± 2%

Note that the above numbers are for concurrent operations using 8x parallelism. The same benchmarks without concurrency (use these numbers when comparing vs batchskl):

name                time/op
ReadWrite/frac_0    1.53µs ± 1%
ReadWrite/frac_10   1.46µs ± 2%
ReadWrite/frac_20   1.39µs ± 3%
ReadWrite/frac_30   1.28µs ± 3%
ReadWrite/frac_40   1.21µs ± 2%
ReadWrite/frac_50   1.11µs ± 3%
ReadWrite/frac_60   1.23µs ±17%
ReadWrite/frac_70   1.16µs ± 4%
ReadWrite/frac_80    959ns ± 3%
ReadWrite/frac_90    738ns ± 5%
ReadWrite/frac_100  81.9ns ± 2%

Forward and backward iteration are also fast:

name                time/op
IterNext            3.97ns ± 5%
IterPrev            3.88ns ± 3%

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrArenaFull indicates that the arena is full and cannot perform any more
	// allocations.
	ErrArenaFull = moerr.NewArenaFullNoCtx()
)
View Source
var ErrRecordExists = moerr.NewKeyAlreadyExistsNoCtx()

ErrRecordExists indicates that an entry with the specified key already exists in the skiplist. Duplicate entries are not directly supported and instead must be handled by the user by appending a unique version suffix to keys.

Functions

func MaxNodeSize

func MaxNodeSize(keySize, valueSize uint32) uint64

MaxNodeSize returns the maximum space needed for a node with the specified key and value sizes. This could overflow a uint32, which is why a uint64 is used here. If a key/value overflows a uint32, it should not be added to the skiplist.

Types

type Arena

type Arena struct {
	// contains filtered or unexported fields
}

Arena is lock-free.

func NewArena

func NewArena(buf []byte) *Arena

NewArena allocates a new arena using the specified buffer as the backing store.

func (*Arena) Capacity

func (a *Arena) Capacity() uint32

Capacity returns the capacity of the arena.

func (*Arena) Size

func (a *Arena) Size() uint32

Size returns the number of bytes allocated by the arena.

type Compare

type Compare func(a, b []byte) int

Compare is a comparison function for keys.

type Inserter

type Inserter struct {
	// contains filtered or unexported fields
}

Inserter TODO(peter)

func (*Inserter) Add

func (ins *Inserter) Add(list *Skiplist, key, value []byte) error

Add TODO(peter)

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator is an iterator over the skiplist object. Use Skiplist.NewIter to construct an iterator. The current state of the iterator can be cloned by simply value copying the struct. All iterator methods are thread-safe.

func (*Iterator) Close

func (it *Iterator) Close() error

Close resets the iterator.

func (*Iterator) Error

func (it *Iterator) Error() error

Error returns any accumulated error.

func (*Iterator) First

func (it *Iterator) First() (bool, []byte, []byte)

First seeks position at the first entry in list. Returns the key and value if the iterator is pointing at a valid entry, and (nil, nil) otherwise. Note that First only checks the upper bound. It is up to the caller to ensure that key is greater than or equal to the lower bound (e.g. via a call to SeekGE(lower)).

func (*Iterator) Head

func (it *Iterator) Head() bool

Head true iff the iterator is positioned at the sentinel head node.

func (*Iterator) Last

func (it *Iterator) Last() (bool, []byte, []byte)

Last seeks position at the last entry in list. Returns the key and value if the iterator is pointing at a valid entry, and (nil, nil) otherwise. Note that Last only checks the lower bound. It is up to the caller to ensure that key is less than the upper bound (e.g. via a call to SeekLT(upper)).

func (*Iterator) Next

func (it *Iterator) Next() (bool, []byte, []byte)

Next advances to the next position. Returns the key and value if the iterator is pointing at a valid entry, and (nil, nil) otherwise. Note: flushIterator.Next mirrors the implementation of Iterator.Next due to performance. Keep the two in sync.

func (*Iterator) Prev

func (it *Iterator) Prev() (bool, []byte, []byte)

Prev moves to the previous position. Returns the key and value if the iterator is pointing at a valid entry, and (nil, nil) otherwise.

func (*Iterator) SeekGE

func (it *Iterator) SeekGE(key []byte) (bool, []byte, []byte)

SeekGE moves the iterator to the first entry whose key is greater than or equal to the given key. Returns the key and value if the iterator is pointing at a valid entry, and (nil, nil) otherwise. Note that SeekGE only checks the upper bound. It is up to the caller to ensure that key is greater than or equal to the lower bound.

func (*Iterator) SeekLT

func (it *Iterator) SeekLT(key []byte) (bool, []byte, []byte)

SeekLT moves the iterator to the last entry whose key is less than the given key. Returns the key and value if the iterator is pointing at a valid entry, and (nil, nil) otherwise. Note that SeekLT only checks the lower bound. It is up to the caller to ensure that key is less than the upper bound.

func (*Iterator) SeekPrefixGE

func (it *Iterator) SeekPrefixGE(
	prefix, key []byte,
) (bool, []byte, []byte)

SeekPrefixGE moves the iterator to the first entry whose key is greater than or equal to the given key. This method is equivalent to SeekGE and is provided so that an arenaskl.Iterator implements the internal/base.InternalIterator interface.

func (*Iterator) SetBounds

func (it *Iterator) SetBounds(lower, upper []byte)

SetBounds sets the lower and upper bounds for the iterator. Note that the result of Next and Prev will be undefined until the iterator has been repositioned with SeekGE, SeekPrefixGE, SeekLT, First, or Last.

func (*Iterator) String

func (it *Iterator) String() string

func (*Iterator) Tail

func (it *Iterator) Tail() bool

Tail true iff the iterator is positioned at the sentinel tail node.

type Skiplist

type Skiplist struct {
	// contains filtered or unexported fields
}

Skiplist is a fast, concurrent skiplist implementation that supports forward and backward iteration. See batchskl.Skiplist for a non-concurrent skiplist. Keys and values are immutable once added to the skiplist and deletion is not supported. Instead, higher-level code is expected to add new entries that shadow existing entries and perform deletion via tombstones. It is up to the user to process these shadow entries and tombstones appropriately during retrieval.

func NewSkiplist

func NewSkiplist(arena *Arena, cmp Compare) *Skiplist

NewSkiplist constructs and initializes a new, empty skiplist. All nodes, keys, and values in the skiplist will be allocated from the given arena.

func (*Skiplist) Add

func (s *Skiplist) Add(key, value []byte) error

Add adds a new key if it does not yet exist. If the key already exists, then Add returns ErrRecordExists. If there isn't enough room in the arena, then Add returns ErrArenaFull.

func (*Skiplist) Arena

func (s *Skiplist) Arena() *Arena

Arena returns the arena backing this skiplist.

func (*Skiplist) Height

func (s *Skiplist) Height() uint32

Height returns the height of the highest tower within any of the nodes that have ever been allocated as part of this skiplist.

func (*Skiplist) NewIter

func (s *Skiplist) NewIter(lower, upper []byte) *Iterator

NewIter returns a new Iterator object. The lower and upper bound parameters control the range of keys the iterator will return. Specifying for nil for lower or upper bound disables the check for that boundary. Note that lower bound is not checked on {SeekGE,First} and upper bound is not check on {SeekLT,Last}. The user is expected to perform that check. Note that it is safe for an iterator to be copied by value.

func (*Skiplist) Reset

func (s *Skiplist) Reset(arena *Arena, cmp Compare)

Reset the skiplist to empty and re-initialize.

func (*Skiplist) Size

func (s *Skiplist) Size() uint32

Size returns the number of bytes that have allocated from the arena.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL