Documentation ¶
Index ¶
- func NewLocalBlobAccess(digestLocationMap DigestLocationMap, blockAllocator BlockAllocator, ...) (blobstore.BlobAccess, error)
- type Block
- type BlockAllocator
- type CompactDigest
- type DigestLocationMap
- type Location
- type LocationRecord
- type LocationRecordArray
- type LocationRecordKey
- type LocationValidator
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewLocalBlobAccess ¶
func NewLocalBlobAccess(digestLocationMap DigestLocationMap, blockAllocator BlockAllocator, storageType blobstore.StorageType, name string, sectorSizeBytes int, blockSectorCount int64, oldBlocksCount int, currentBlocksCount int, newBlocksCount int) (blobstore.BlobAccess, error)
NewLocalBlobAccess creates a caching storage backend that stores data on the local system (e.g., on disk or in memory). This backend works by storing blobs in blocks. Blobs cannot span multiple blocks, meaning that blocks generally need to be large in size (gigabytes). The number of blocks may be relatively low. For example, for a 512 GiB cache, it is acceptable to create 32 blocks of 16 GiB in size.
Blocks are partitioned into three groups based on their creation time, named "old", "current" and "new". Blobs provided to Put() will always be stored in a block in the "new" group. When the oldest block in the "new" group becomes full, it is moved to the "current" group. This causes the oldest block in the "current" group to be displaced to the "old" group. The oldest block in the "old" group is discarded.
The difference between the "current" group and the "old" group is in how data gets treated when requested through Get() and FindMissing(). Data in the "old" group is at risk of being removed in the nearby future, which is why it is copied into the "new" group when requested. Data in the "current" group is assumed to remain present for the time being, which is why it is left in place.
Below is an illustration of how the blocks of data may be laid out at a given point in time. Every column of █ characters corresponds to a single block. The number of characters indicates the amount of data stored within.
← Over time, blocks move from "new" to "current" to "old" ← Old Current New █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █ ↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑ └─┴─┴─┴─────────────────────┴─┴─┴─┘ Data gets copied from "old" to "new" when requested.
Blobs get stored in blocks in the "new" group with an inverse exponential probability. This is done to reduce the probability of multiple block rotations close after each other, as this might put excessive pressure on the garbage collector. Because the placement distribution decreases rapidly, having more than three or four "new" blocks would be wasteful. Having fewer is also not recommended, as that increases the chance of placing objects that are used together inside the same block. This may cause 'tidal waves' of I/O whenever such data ends up in the "old" group at once.
After initialization, there will be fewer blocks in the "current" group than configured, due to there simply being no data. This is compensated by adding more blocks to the "new" group. Unlike the regular blocks in this group, these will have a uniform placement distribution that is twice as high as normal. This is done to ensure the "current" blocks are randomly seeded to reduce 'tidal waves' later on.
The number of blocks in the "old" group should not be too low, as this would cause this storage backend to become a FIFO instead of being LRU-like. Setting it too high is also not recommended, as this would increase redundancy in the data stored. The "current" group should likely be two or three times as large as the "old" group.
Types ¶
type Block ¶
type Block interface { Get(digest digest.Digest, offsetBytes int64, sizeBytes int64) buffer.Buffer Put(offsetBytes int64, b buffer.Buffer) error Release() }
Block of storage that contains a sequence of blobs. Buffers returned by Get() must remain valid, even if Release() is called.
type BlockAllocator ¶
BlockAllocator is used by LocalBlobAccess to allocate large blocks of storage (in-memory or on-disk) at a time. These blocks are then filled with blobs that are stored without any padding in between.
func NewInMemoryBlockAllocator ¶
func NewInMemoryBlockAllocator(blockSize int) BlockAllocator
NewInMemoryBlockAllocator creates a block allocator that stores its blocks directly in memory, being backed by a simple byte slice. The byte slice is already fully allocated. It does not grow to the desired size lazily.
func NewPartitioningBlockAllocator ¶
func NewPartitioningBlockAllocator(f blockdevice.ReadWriterAt, storageType blobstore.StorageType, sectorSizeBytes int, blockSectorCount int64, blockCount int, disableIntegrityChecking bool) BlockAllocator
NewPartitioningBlockAllocator implements a BlockAllocator that can be used by LocalBlobAccess to store data. Blocks created by this allocator are backed by a single ReadWriterAt. Storage is partitioned into equally sized blocks that are stored consecutively.
Blocks are initially allocated out by increasing offset. Later on, the least recently released blocks are reused. This adds wear leveling to the system.
This implementation also ensures that writes against underlying storage are all performed at sector boundaries and sizes. This ensures that no unnecessary reads are performed.
type CompactDigest ¶
CompactDigest is a binary representation of digest.Digest that is used as the key type by DigestLocationMap. It is not possible to convert CompactDigestLocation back to a digest.Digest.
func NewCompactDigest ¶
func NewCompactDigest(key string) CompactDigest
NewCompactDigest creates a new CompactDigest based on a key that is returned by digest.Digest.GetKey().
type DigestLocationMap ¶
type DigestLocationMap interface { Get(digest CompactDigest, validator *LocationValidator) (Location, error) Put(digest CompactDigest, validator *LocationValidator, location Location) error }
DigestLocationMap is equivalent to a map[CompactDigest]Location. It is used by LocalBlobAccess to track where blobs are stored, so that they may be accessed. Implementations are permitted to discard entries for outdated locations during lookups/insertions using the provided validator.
func NewHashingDigestLocationMap ¶
func NewHashingDigestLocationMap(recordArray LocationRecordArray, recordsCount int, hashInitialization uint64, maximumGetAttempts uint32, maximumPutAttempts int, name string) DigestLocationMap
NewHashingDigestLocationMap creates a DigestLocationMap backed by a hash table that uses a strategy similar to cuckoo hashing to handle collisions. By displacing entries for older locations in favour of newer locations, older locations are gradually pushed to the 'outside' of the hash table.
Because Get() and Put() take a LocationValidator, they can treat entries for no longer existent locations as invalid. This allows the hash table to be self-cleaning.
Because the hash table has a limited size (and does not resize), there is a risk of the hash collision rate becoming too high. In the case of a full hash table, it would even deadlock. To prevent this from happening, there is a fixed upper bound on the number of iterations Get() and Put() are willing to run. Records will be discarded once the upper bound is reached. Though this may sound harmful, there is a very high probability that the entry being discarded is one of the older ones.
type Location ¶
Location at which a blob is stored within blocks managed by a LocalBlobAccess. A location consists of a number that identifies a block and the region within the block.
type LocationRecord ¶
type LocationRecord struct { Key LocationRecordKey Location Location }
LocationRecord is a key-value pair that contains information on where a blob may be found.
type LocationRecordArray ¶
type LocationRecordArray interface { Get(index int) LocationRecord Put(index int, locationRecord LocationRecord) }
LocationRecordArray is equivalent to a []LocationRecord. It is used as the backing store by HashingDigestLocationMap. Instead of storing data in a slice in memory, an implementation could store this information on disk for a persistent data store.
func NewInMemoryLocationRecordArray ¶
func NewInMemoryLocationRecordArray(size int) LocationRecordArray
NewInMemoryLocationRecordArray creates a LocationRecordArray that stores its data in memory. LocalBlobAccess relies on being able to store a mapping from digest.Digests to a location in memory or on disk. This type implements a non-persistent storage of such a map in memory.
type LocationRecordKey ¶
type LocationRecordKey struct { Digest CompactDigest Attempt uint32 }
LocationRecordKey contains a compact, partial binary representation of digest.Digest that is used to identify blobs in HashingDigestLocationMap.
Because HashingDigestLocationMap uses open addressing, LocationRecords may be stored at alternative, less preferred indices. The Attempt field contains the probing distance at which the record is stored.
func (*LocationRecordKey) Hash ¶
func (k *LocationRecordKey) Hash(hashInitialization uint64) uint64
Hash a LocationRecordKey using FNV-1a. Instead of using the well-known offset basis of 14695981039346656037, a custom initialization may be provided. This permits mirrored instances to each use a different offset basis.
In the unlikely event that the collision rate on the hash table becomes too high, records may simply get lost. By letting mirrored instances use different offset bases, it becomes less likely that both instances lose the same record.
For non-persistent setups, it is advised to use a randomly chosen offset basis to prevent collision attacks.
type LocationValidator ¶
LocationValidator assesses whether a Location of where a blob is stored is still valid. It does this by bounds checking the ID number of the block.
func (*LocationValidator) IsValid ¶
func (v *LocationValidator) IsValid(l Location) bool
IsValid returns whether the provided Location still refers to a place where the data corresponding to this blob may be retrieved.