Documentation
¶
Overview ¶
Collection data file contains document data. Every document has a binary header and UTF-8 text content. Documents are inserted one after another, and occupies 2x original document size to leave room for future updates. Deleted documents are marked as deleted and the space is irrecoverable until a "scrub" action (in DB logic) is carried out. When update takes place, the new document may overwrite original document if there is enough space, otherwise the original document is marked as deleted and the updated document is inserted as a new document.
Common data file features - enlarge, close, close, etc.
Hash table file contains binary content; it implements a static hash table made of hash buckets and integer entries. Every bucket has a fixed number of entries. When a bucket becomes full, a new bucket is chained to it in order to store more entries. Every entry has an integer key and value. An entry key may have multiple values assigned to it, however the combination of entry key and value must be unique across the entire hash table.
(Collection) Partition is a collection data file accompanied by a hash table in order to allow addressing of a document using an unchanging ID: The hash table stores the unchanging ID as entry key and the physical document location as entry value.
Index ¶
- Constants
- func GetPartitionRange(partNum, totalParts int) (start int, end int)
- func HashKey(key int) int
- func LooksEmpty(buf gommap.MMap) bool
- type Collection
- type DataFile
- type HashTable
- type Partition
- func (part *Partition) ApproxDocCount() int
- func (part *Partition) Clear() error
- func (part *Partition) Close() error
- func (part *Partition) Delete(id int) (err error)
- func (part *Partition) ForEachDoc(partNum, totalPart int, fun func(id int, doc []byte) bool) (moveOn bool)
- func (part *Partition) Insert(id int, data []byte) (physID int, err error)
- func (part *Partition) LockUpdate(id int) (err error)
- func (part *Partition) Read(id int) ([]byte, error)
- func (part *Partition) UnlockUpdate(id int)
- func (part *Partition) Update(id int, data []byte) (err error)
Constants ¶
const ( COL_FILE_GROWTH = 32 * 1048576 // Collection file initial size & size growth (32 MBytes) DOC_MAX_ROOM = 2 * 1048576 // Max document size (2 MBytes) DOC_HEADER = 1 + 10 // Document header size - validity (single byte), document room (int 10 bytes) // Pre-compiled document padding (128 spaces) PADDING = "" /* 128-byte string literal not displayed */ LEN_PADDING = len(PADDING) )
const ( HT_FILE_GROWTH = 32 * 1048576 // Hash table file initial size & file growth ENTRY_SIZE = 1 + 10 + 10 // Hash entry size: validity (single byte), key (int 10 bytes), value (int 10 bytes) BUCKET_HEADER = 10 // Bucket header size: next chained bucket number (int 10 bytes) PER_BUCKET = 16 // Entries per bucket HASH_BITS = 16 // Number of hash key bits BUCKET_SIZE = BUCKET_HEADER + PER_BUCKET*ENTRY_SIZE // Size of a bucket INITIAL_BUCKETS = 65536 // Initial number of buckets == 2 ^ HASH_BITS )
Variables ¶
This section is empty.
Functions ¶
func GetPartitionRange ¶
Divide the entire hash table into roughly equally sized partitions, and return the start/end key range of the chosen partition.
func HashKey ¶
Smear the integer entry key and return the portion (first HASH_BITS bytes) used for allocating the entry.
func LooksEmpty ¶
Return true if the buffer begins with 64 consecutive zero bytes.
Types ¶
type Collection ¶
type Collection struct {
*DataFile
}
Collection file contains document headers and document text data.
func OpenCollection ¶
func OpenCollection(path string) (col *Collection, err error)
Open a collection file.
func (*Collection) ForEachDoc ¶
func (col *Collection) ForEachDoc(fun func(id int, doc []byte) bool)
Run the function on every document; stop when the function returns false.
func (*Collection) Insert ¶
func (col *Collection) Insert(data []byte) (id int, err error)
Insert a new document, return the new document ID.
func (*Collection) Read ¶
func (col *Collection) Read(id int) []byte
Find and retrieve a document by ID (physical document location). Return value is a copy of the document.
type DataFile ¶
Data file keeps track of the amount of total and used space.
func OpenDataFile ¶
Open a data file that grows by the specified size.
func (*DataFile) EnsureSize ¶
Ensure there is enough room for that many bytes of data.
type HashTable ¶
Hash table file is a binary file containing buckets of hash entries.
func OpenHashTable ¶
Open a hash table file.
func (*HashTable) GetPartition ¶
Return all entries in the chosen partition.
type Partition ¶
Partition associates a hash table with collection documents, allowing addressing of a document using an unchanging ID.
func OpenPartition ¶
Open a collection partition.
func (*Partition) ApproxDocCount ¶
Return approximate number of documents in the partition.
func (*Partition) ForEachDoc ¶
func (part *Partition) ForEachDoc(partNum, totalPart int, fun func(id int, doc []byte) bool) (moveOn bool)
Partition documents into roughly equally sized portions, and run the function on every document in the portion.
func (*Partition) Insert ¶
Insert a document. The ID may be used to retrieve/update/delete the document later on.
func (*Partition) LockUpdate ¶
Lock a document for exclusive update.
func (*Partition) UnlockUpdate ¶
Unlock a document to make it ready for the next update.