Documentation ¶
Overview ¶
Package store provides a map[[]byte][]byte based on the robin-hood hashmap algorithm that's hookable with persistance or storage.
Index ¶
- Constants
- Variables
- func ByteSliceToUint64Slice(in []byte) ([]uint64, error)
- func BytesAppend(m *RHStore, b []byte) (offset, size uint64, err error)
- func BytesRead(m *RHStore, offset, size uint64) ([]byte, error)
- func BytesTruncate(m *RHStore, size uint64) error
- func Grow(m *RHStore, newSize int) error
- func Uint64SliceToByteSlice(in []uint64) ([]byte, error)
- type BytesLessFunc
- type Chunks
- func (cs *Chunks) AddChunk() (err error)
- func (cs *Chunks) BytesAppend(b []byte) (offsetOut, sizeOut uint64, err error)
- func (cs *Chunks) BytesRead(offset, size uint64) ([]byte, error)
- func (cs *Chunks) BytesTruncate(size uint64) error
- func (cs *Chunks) Close() error
- func (cs *Chunks) PrevChunkLens() int
- type Heap
- func (h *Heap) Close() error
- func (h *Heap) Error(err error) error
- func (h *Heap) Get(i int64) ([]byte, error)
- func (h *Heap) GetOffsetSize(i int64) ([]byte, uint64, uint64, error)
- func (h *Heap) Len() int
- func (h *Heap) Less(i, j int) bool
- func (h *Heap) Pop() interface{}
- func (h *Heap) Push(x interface{})
- func (h *Heap) PushBytes(xbytes []byte) error
- func (h *Heap) Reset() error
- func (h *Heap) SetOffsetSize(i int64, offset, size uint64) error
- func (h *Heap) Sort(offset int64) error
- func (h *Heap) Swap(i, j int)
- type Item
- type Key
- type MMapRef
- type OffsetSize
- type RHStore
- func (m *RHStore) CopyTo(dest *RHStore)
- func (m *RHStore) Del(k Key) (prev Val, existed bool, err error)
- func (m *RHStore) Get(k Key) (v Val, found bool)
- func (m *RHStore) Item(idx int) Item
- func (m *RHStore) ItemKey(item Item) (Key, error)
- func (m *RHStore) ItemVal(item Item) (Val, error)
- func (m *RHStore) Reset() error
- func (m *RHStore) Set(k Key, v Val) (wasNew bool, err error)
- func (m *RHStore) SetOffsets(kOffset, kSize, vOffset, vSize uint64) (wasNew bool, err error)
- func (m *RHStore) Visit(callback func(k Key, v Val) (keepGoing bool)) error
- func (m *RHStore) VisitOffsets(callback func(kOffset, kSize, vOffset, vSize uint64) (keepGoing bool)) error
- type RHStoreFile
- type RHStoreFileOptions
- type Val
Constants ¶
const ItemLen = 3 // Number of uint64's needed for item metadata.
const MaskDistance = uint64(0xFFFC000000000000) // 14 bits << ShiftDistance.
const MaskKeySize = uint64(0x0000000001FFFFFF) // 25 bits.
const MaskValSize = uint64(0x0003FFFFFE000000) // 25 bits << ShiftValSize.
const MaxKeyLen = (1 << 25) - 1
MaxKeyLen is representable by 25 bit number, or ~33MB.
const MaxValLen = (1 << 25) - 1
MaxKeyLen is representable by 25 bit number, or ~33MB.
const ShiftDistance = 50 // # of bits to left-shift a 14-bit Distance.
const ShiftValSize = 25 // # of bits to left-shift a 25-bit ValSize.
Variables ¶
var DefaultRHStoreFileOptions = RHStoreFileOptions{
StartSize: 5303,
ChunkSizeBytes: 4 * 1024 * 1024,
MaxDistance: 10,
FileSuffix: ".rhstore",
}
DefaultRHStoreFileOptions are the default values for options.
var ErrKeyTooBig = errors.New("key too big")
ErrKeyTooBig means a key was too large.
var ErrKeyZeroLen = errors.New("key zero len")
ErrKeyZeroLen means a key was nil.
var ErrValTooBig = errors.New("val too big")
ErrValTooBig means a val was too large.
var MMapPageGranularity = MMapPageSize
MMapPageGranularity defines the granularity of mmap'ed region. Some operating systems require mmap to occur on particular boundaries that are not equal to a page size.
var MMapPageSize = int64(4096)
MMapPageSize is the default page size in bytes used by rhstore.
Functions ¶
func ByteSliceToUint64Slice ¶
ByteSliceToUint64Slice gives access to []byte as []uint64. By default, an efficient O(1) implementation of this function is used, but which requires the unsafe package. See the "safe" build tag to use an O(N) implementation that does not need the unsafe package.
func BytesAppend ¶
BytesAppend is the default implementation to append data to the backing bytes of an RHStore.
func BytesRead ¶
BytesRead is the default implementation to read a bytes from the backing bytes of an RHStore.
func BytesTruncate ¶
BytesTruncate is the default implementation to truncate the backing bytes of an RHStore to a given length.
func Uint64SliceToByteSlice ¶
Uint64SliceToByteSlice gives access to []uint64 as []byte. By default, an efficient O(1) implementation of this function is used, but which requires the unsafe package. See the "safe" build tag to use an O(N) implementation that does not need the unsafe package.
Types ¶
type BytesLessFunc ¶
BytesLessFunc returns true when a is less than b.
type Chunks ¶
type Chunks struct {
PathPrefix, FileSuffix string
// ChunkSizeBytes is the size of each chunk file.
ChunkSizeBytes int
// Chunks is a sequence of append-only chunk files. An example
// usage is to hold the underlying key/val bytes for a
// hashmap. The 0'th chunk is a special, in-memory-only chunk
// without an actual backing file.
Chunks []*MMapRef
// LastChunkLen is the logical length of the last chunk, which is
// the chunk that is still being appended to when there are new,
// incoming data items.
LastChunkLen int
// Recycled is a stack of chunks that are ready to reuse.
Recycled []*MMapRef
}
Chunks tracks a sequence of persisted chunk files.
func (*Chunks) AddChunk ¶
AddChunk appends a new chunk file to the chunks, or reuses a previously recycled chunk file.
func (*Chunks) BytesAppend ¶
func (*Chunks) BytesTruncate ¶
func (*Chunks) PrevChunkLens ¶
PrevChunkLens returns the sum of the chunk lengths for all but the last chunk.
type Heap ¶
type Heap struct { // LessFunc is used to compare two data items. LessFunc BytesLessFunc // CurItems holds the # of current, live items on the heap. CurItems int64 // MaxItems holds the maximum # of items the heap has ever held. MaxItems int64 // Heap is a min-heap of offset (uint64) and size (uint64) pairs, // which refer into the Data. The Chunks of the Heap must be // configured with a ChunksSizeBytes that's a multiple of 16. Heap *Chunks // Data represents the application data items held in chunks, // where each item is prefixed by its length as a uint64. Data *Chunks // Free represents unused but reusable slices in the Data. The // free list is appended to as items are popped from the heap. Free []OffsetSize // Temp is used during mutations. Temp []byte // Extra is application specific data associated with this heap. Extra interface{} // Err tracks the first error encountered. Err error }
Heap provides a min-heap using a given BytesLessFunc. When the min-heap grows too large, it will automatically spill data to temporary, mmap()'ed files based on the features from rhmap/store/Chunks. The implementation is meant to be used with golang's container/heap package. The implementation is not concurrent safe. The implementation is designed to avoid allocations and reuse existing []byte buffers when possible.
The heap can also be used directly with the PushBytes() API without using golang's container/heap package, in which case this data structure behaves as a appendable sequence of []byte entries.
func (*Heap) GetOffsetSize ¶
Get returns the i'th item from the min-heap, along with its holding area offset and holding area size in the data chunks.
func (*Heap) Push ¶
func (h *Heap) Push(x interface{})
Push will error if the incoming "data length + 8" is greater than the configured ChunkSizeBytes of the heap's data chunks.
func (*Heap) Sort ¶
Sort pops items off the heap and places them at the end of the heap slots in reverse order, leaving sorted items at the end of the heap slots. This approach does not allocate additional space. If there are n items in the heap, then n-offset items will be left sorted at the end of the heap slots. An offset of 0 sorts the entire heap.
type Item ¶
type Item []uint64
Item represents an entry in the RHStore, where each item uses 3 contiguous slots (uint64's) for encoding...
MSB....................................................LSB
uint64 0: [64-bits keyOffset ] uint64 1: [64-bits valOffset ] uint64 2: [14-bits distance] | [25 bits valSize] | [25 bits keySize]
The len(Item) == 3 (i.e., 3 uint64's). The key/val offsets are into the RHStore's backing bytes.
func (Item) DistanceAdd ¶
func (Item) KeyOffsetSize ¶
func (Item) ValOffsetSize ¶
type Key ¶
type Key []byte
Key is the type for a key. A key with len() of 0 is invalid. Key max length is 2^25-1 (~33MB).
type MMapRef ¶
MMapRef provides a ref-counting wrapper around a mmap handle. The implementation is not concurrent safe.
func CreateFileAsMMapRef ¶
CreateFileAsMMapRef creates a new, empty file of the given size in bytes and mmap()'s it. If the path is "", then an in-memory-only MMapRef is returned, which is an MMapRef that really isn't mmap()'ing an actual file.
func MMapFileRegion ¶
func MMapFileRegion(path string, file *os.File, offset, size int64, readWrite bool) (*MMapRef, error)
MMapFileRegion mmap()'s a region of bytes in an os.File.
type OffsetSize ¶
type OffsetSize struct {
Offset, Size uint64
}
OffsetSize associates an offset with a size.
type RHStore ¶
type RHStore struct { // Slots are the backing data for item metadata of the hashmap. // 3 slots are used to represent a single item's metadata. Slots []uint64 // Size is the max number of items this hashmap can hold. // Size * 3 == len(Slots). Size int // Bytes is the backing slice for key/val data that's used by the // default BytesTruncate/Append/Read() implementations. Bytes []byte // Number of items in the RHStore. Count int // Overridable hash func. Defaults to hash/fnv.New32a(). HashFunc func(Key) uint32 // When any item's distance gets too large, grow the RHStore. // Defaults to 10. MaxDistance int // Overridable func to calculate a size multiplier when resizing // for growth is needed. Default returns a constant 2.0. Growth func(*RHStore) float64 // Overridable func to grow the RHStore. Grow func(m *RHStore, newSize int) error // Overridable func to truncate the backing bytes. BytesTruncate func(m *RHStore, n uint64) error // Overridable func to append data to the backing bytes. BytesAppend func(m *RHStore, b []byte) (offset, size uint64, err error) // Overridable func to read data from the backing bytes. BytesRead func(m *RHStore, offset, size uint64) ([]byte, error) // Extra is for optional data that the application wants to // associate with the RHStore instance. Extra interface{} Close func() error // Temp is used during mutations to avoid memory allocations. Temp Item }
RHStore is a hashmap that uses the robinhood algorithm. This implementation is not concurrent safe.
Unlike an RHMap, the key/val bytes placed into an RHStore are copied and owned by the RHStore. The RHStore's internal data structures are also more "flat" than an RHMap's, allowing for easier persistence with an RHStore.
RHStore has more hook-points or callback options than an RHMap, which are intended for advanced users who might use the hook-points to build a persistent data structure.
func (*RHStore) Del ¶
Del removes a key/val from the RHStore. The previous val, if it existed, is returned.
NOTE: RHStore does not remove key/val data from its backing bytes, so deletes of items will not reduce memory usage. Applications may instead use CopyTo() to copy any remaining live key/val data to another, potentially smaller RHStore.
func (*RHStore) Get ¶
Get retrieves the val for a given key. The returned val, if found, is a slice into the RHStore's backing bytes and should only be used within its returned len() -- don't append() to the returned val as that might incorrectly overwrite unrelated data.
func (*RHStore) Set ¶
Set inserts or updates a key/val into the RHStore. The returned wasNew will be true if the mutation was on a newly-seen inserted key, and wasNew will be false if the mutation was an update to an existing key.
NOTE: RHStore appends or copies the incoming key/val into its backing bytes. Multiple updates to the same key will continue to grow the backing bytes -- i.e., the backing bytes are not reused or recycled during a Set(). Applications that need to really remove deleted bytes may instead use CopyTo() to copy live key/val data to another RHStore. Applications might also mutate val bytes in-place as another way to save allocations.
func (*RHStore) SetOffsets ¶
type RHStoreFile ¶
type RHStoreFile struct { // PathPrefix is the path prefix of any persisted files. PathPrefix string // Options configured for this RHStoreFile instance. Options RHStoreFileOptions // The embedded RHStore is hooked with callbacks that spill data // out to disk during growth, to ever larger "slots" file for item // metadata, and to an ever-growing sequence of "chunk" files // where key/val bytes are stored. RHStore // Generation is incremented whenever the hashmap metadata slots // are grown. See Grow(). The initial, in-memory only hashmap // that has the size of Options.StartSize is generation number 0. Generation int64 // Slots holds the hashmap's item metadata slots. The item // metadata entries in the slots have offset+size uint64's that // reference key/val byte slices in the chunks. Slots *MMapRef // Chunks is a sequence of append-only chunk files which hold the // underlying key/val bytes for the hashmap. Chunks }
RHStoreFile represents a persisted hashmap. Its implementation is not concurrent safe.
Unlike an RHMap or RHStore, the key's and val's in an RHStoreFile may not be larger than the Options.ChunkSizeBytes.
The design point is to support applications that need to process or analyze ephemeral data which becomes large enough to not fit comfortably into memory, where an temporary, spillable hashmap is appropriate (e.g., performing a GROUP BY aggregation on CSV data files). A different use-case of long-term database-like storage (e.g., with checksums, versioning, reloading, and multithreaded access, etc) is not the current design target of RHStoreFile.
func CreateRHStoreFile ¶
func CreateRHStoreFile(pathPrefix string, options RHStoreFileOptions) ( rv *RHStoreFile, err error)
CreateRHStoreFile starts a brand new RHStoreFile, which is a hashmap based on the robin-hood algorithm, and which will also spill out to mmap()'ed files if the hashmap becomes too big. The returned RHStoreFile is not concurrent safe. Providing a pathPrefix that's already in-use has undefined behavior.
func (*RHStoreFile) Close ¶
func (sf *RHStoreFile) Close() error
Close releases resources used by the RHStoreFile.
func (*RHStoreFile) Grow ¶
func (sf *RHStoreFile) Grow(nextSize int) error
Grow creates a new slots file and copies over existing metadata items from RHStore.Slots, if any.
type RHStoreFileOptions ¶
type RHStoreFileOptions struct { // Initial size of the first in-memory-only hashmap in the # of // items that its metadata slots can theoretically hold "fully // loaded" before growing. StartSize int // MaxDistance is a config on hashmap growth in that when the // distance of any hashmap item becomes > MaxDistance, the hashmap // metadata slots will be grown (and spilled to mmap()'ed files). MaxDistance int // ChunkSizeBytes is the size of each chunk file in bytes. // No key or val can be larger than ChunkSizeBytes. // ChunkSizeBytes must be > 0. ChunkSizeBytes int // FileSuffix is the file suffix used for all the files that were // created or managed by an RHStoreFile. FileSuffix string }
RHStoreFileOptions represents creation-time configurable options for an RHStoreFile.