local

package

v0.0.0-...-932836e Latest Latest Go to latest Published: Jul 23, 2020 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/mixpanel/bb-storage

Links

Open Source Insights

Documentation ¶

Index ¶

func NewLocalBlobAccess(digestLocationMap DigestLocationMap, blockAllocator BlockAllocator, ...) (blobstore.BlobAccess, error)
type Block
type BlockAllocator
- func NewInMemoryBlockAllocator(blockSize int) BlockAllocator
- func NewPartitioningBlockAllocator(f blockdevice.ReadWriterAt, storageType blobstore.StorageType, ...) BlockAllocator
type CompactDigest
- func NewCompactDigest(key string) CompactDigest
type DigestLocationMap
- func NewHashingDigestLocationMap(recordArray LocationRecordArray, recordsCount int, hashInitialization uint64, ...) DigestLocationMap
type Location
- func (a Location) IsOlder(b Location) bool
type LocationRecord
type LocationRecordArray
- func NewInMemoryLocationRecordArray(size int) LocationRecordArray
type LocationRecordKey
- func (k *LocationRecordKey) Hash(hashInitialization uint64) uint64
type LocationValidator
- func (v *LocationValidator) IsValid(l Location) bool

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewLocalBlobAccess ¶

func NewLocalBlobAccess(digestLocationMap DigestLocationMap, blockAllocator BlockAllocator, storageType blobstore.StorageType, name string, sectorSizeBytes int, blockSectorCount int64, oldBlocksCount int, currentBlocksCount int, newBlocksCount int) (blobstore.BlobAccess, error)

NewLocalBlobAccess creates a caching storage backend that stores data on the local system (e.g., on disk or in memory). This backend works by storing blobs in blocks. Blobs cannot span multiple blocks, meaning that blocks generally need to be large in size (gigabytes). The number of blocks may be relatively low. For example, for a 512 GiB cache, it is acceptable to create 32 blocks of 16 GiB in size.

Blocks are partitioned into three groups based on their creation time, named "old", "current" and "new". Blobs provided to Put() will always be stored in a block in the "new" group. When the oldest block in the "new" group becomes full, it is moved to the "current" group. This causes the oldest block in the "current" group to be displaced to the "old" group. The oldest block in the "old" group is discarded.

The difference between the "current" group and the "old" group is in how data gets treated when requested through Get() and FindMissing(). Data in the "old" group is at risk of being removed in the nearby future, which is why it is copied into the "new" group when requested. Data in the "current" group is assumed to remain present for the time being, which is why it is left in place.

Below is an illustration of how the blocks of data may be laid out at a given point in time. Every column of █ characters corresponds to a single block. The number of characters indicates the amount of data stored within.

← Over time, blocks move from "new" to "current" to "old" ←

              Old         Current        New
            █ █ █ █ │ █ █ █ █ █ █ █ █ │
            █ █ █ █ │ █ █ █ █ █ █ █ █ │
            █ █ █ █ │ █ █ █ █ █ █ █ █ │
            █ █ █ █ │ █ █ █ █ █ █ █ █ │
            █ █ █ █ │ █ █ █ █ █ █ █ █ │ █
            █ █ █ █ │ █ █ █ █ █ █ █ █ │ █
            █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █
            █ █ █ █ │ █ █ █ █ █ █ █ █ │ █ █ █
            ↓ ↓ ↓ ↓                     ↑ ↑ ↑ ↑
            └─┴─┴─┴─────────────────────┴─┴─┴─┘
   Data gets copied from "old" to "new" when requested.

Blobs get stored in blocks in the "new" group with an inverse exponential probability. This is done to reduce the probability of multiple block rotations close after each other, as this might put excessive pressure on the garbage collector. Because the placement distribution decreases rapidly, having more than three or four "new" blocks would be wasteful. Having fewer is also not recommended, as that increases the chance of placing objects that are used together inside the same block. This may cause 'tidal waves' of I/O whenever such data ends up in the "old" group at once.

After initialization, there will be fewer blocks in the "current" group than configured, due to there simply being no data. This is compensated by adding more blocks to the "new" group. Unlike the regular blocks in this group, these will have a uniform placement distribution that is twice as high as normal. This is done to ensure the "current" blocks are randomly seeded to reduce 'tidal waves' later on.

The number of blocks in the "old" group should not be too low, as this would cause this storage backend to become a FIFO instead of being LRU-like. Setting it too high is also not recommended, as this would increase redundancy in the data stored. The "current" group should likely be two or three times as large as the "old" group.

Types ¶

type Block ¶

type Block interface {
	Get(digest digest.Digest, offsetBytes int64, sizeBytes int64) buffer.Buffer
	Put(offsetBytes int64, b buffer.Buffer) error
	Release()
}

Block of storage that contains a sequence of blobs. Buffers returned by Get() must remain valid, even if Release() is called.

type BlockAllocator ¶

type BlockAllocator interface {
	NewBlock() (Block, error)
}

BlockAllocator is used by LocalBlobAccess to allocate large blocks of storage (in-memory or on-disk) at a time. These blocks are then filled with blobs that are stored without any padding in between.

func NewInMemoryBlockAllocator ¶

func NewInMemoryBlockAllocator(blockSize int) BlockAllocator

NewInMemoryBlockAllocator creates a block allocator that stores its blocks directly in memory, being backed by a simple byte slice. The byte slice is already fully allocated. It does not grow to the desired size lazily.

func NewPartitioningBlockAllocator ¶

func NewPartitioningBlockAllocator(f blockdevice.ReadWriterAt, storageType blobstore.StorageType, sectorSizeBytes int, blockSectorCount int64, blockCount int, disableIntegrityChecking bool) BlockAllocator

NewPartitioningBlockAllocator implements a BlockAllocator that can be used by LocalBlobAccess to store data. Blocks created by this allocator are backed by a single ReadWriterAt. Storage is partitioned into equally sized blocks that are stored consecutively.

Blocks are initially allocated out by increasing offset. Later on, the least recently released blocks are reused. This adds wear leveling to the system.

This implementation also ensures that writes against underlying storage are all performed at sector boundaries and sizes. This ensures that no unnecessary reads are performed.

type CompactDigest ¶

type CompactDigest [sha256.Size]byte

CompactDigest is a binary representation of digest.Digest that is used as the key type by DigestLocationMap. It is not possible to convert CompactDigestLocation back to a digest.Digest.

func NewCompactDigest ¶

func NewCompactDigest(key string) CompactDigest

NewCompactDigest creates a new CompactDigest based on a key that is returned by digest.Digest.GetKey().

type DigestLocationMap ¶

type DigestLocationMap interface {
	Get(digest CompactDigest, validator *LocationValidator) (Location, error)
	Put(digest CompactDigest, validator *LocationValidator, location Location) error
}

DigestLocationMap is equivalent to a map[CompactDigest]Location. It is used by LocalBlobAccess to track where blobs are stored, so that they may be accessed. Implementations are permitted to discard entries for outdated locations during lookups/insertions using the provided validator.

func NewHashingDigestLocationMap ¶

func NewHashingDigestLocationMap(recordArray LocationRecordArray, recordsCount int, hashInitialization uint64, maximumGetAttempts uint32, maximumPutAttempts int, name string) DigestLocationMap

NewHashingDigestLocationMap creates a DigestLocationMap backed by a hash table that uses a strategy similar to cuckoo hashing to handle collisions. By displacing entries for older locations in favour of newer locations, older locations are gradually pushed to the 'outside' of the hash table.

Because Get() and Put() take a LocationValidator, they can treat entries for no longer existent locations as invalid. This allows the hash table to be self-cleaning.

Because the hash table has a limited size (and does not resize), there is a risk of the hash collision rate becoming too high. In the case of a full hash table, it would even deadlock. To prevent this from happening, there is a fixed upper bound on the number of iterations Get() and Put() are willing to run. Records will be discarded once the upper bound is reached. Though this may sound harmful, there is a very high probability that the entry being discarded is one of the older ones.

type Location ¶

type Location struct {
	BlockID     int
	OffsetBytes int64
	SizeBytes   int64
}

Location at which a blob is stored within blocks managed by a LocalBlobAccess. A location consists of a number that identifies a block and the region within the block.

func (Location) IsOlder ¶

func (a Location) IsOlder(b Location) bool

IsOlder returns true if the receiving Location is stored in Block that is older than the Location argument, or if it is stored prior to the Location argument within the same Block.

type LocationRecord ¶

type LocationRecord struct {
	Key      LocationRecordKey
	Location Location
}

LocationRecord is a key-value pair that contains information on where a blob may be found.

type LocationRecordArray ¶

type LocationRecordArray interface {
	Get(index int) LocationRecord
	Put(index int, locationRecord LocationRecord)
}

LocationRecordArray is equivalent to a []LocationRecord. It is used as the backing store by HashingDigestLocationMap. Instead of storing data in a slice in memory, an implementation could store this information on disk for a persistent data store.

func NewInMemoryLocationRecordArray ¶

func NewInMemoryLocationRecordArray(size int) LocationRecordArray

NewInMemoryLocationRecordArray creates a LocationRecordArray that stores its data in memory. LocalBlobAccess relies on being able to store a mapping from digest.Digests to a location in memory or on disk. This type implements a non-persistent storage of such a map in memory.

type LocationRecordKey ¶

type LocationRecordKey struct {
	Digest  CompactDigest
	Attempt uint32
}

LocationRecordKey contains a compact, partial binary representation of digest.Digest that is used to identify blobs in HashingDigestLocationMap.

Because HashingDigestLocationMap uses open addressing, LocationRecords may be stored at alternative, less preferred indices. The Attempt field contains the probing distance at which the record is stored.

func (*LocationRecordKey) Hash ¶

func (k *LocationRecordKey) Hash(hashInitialization uint64) uint64

Hash a LocationRecordKey using FNV-1a. Instead of using the well-known offset basis of 14695981039346656037, a custom initialization may be provided. This permits mirrored instances to each use a different offset basis.

In the unlikely event that the collision rate on the hash table becomes too high, records may simply get lost. By letting mirrored instances use different offset bases, it becomes less likely that both instances lose the same record.

For non-persistent setups, it is advised to use a randomly chosen offset basis to prevent collision attacks.

type LocationValidator ¶

type LocationValidator struct {
	OldestBlockID int
	NewestBlockID int
}

LocationValidator assesses whether a Location of where a blob is stored is still valid. It does this by bounds checking the ID number of the block.

func (*LocationValidator) IsValid ¶

func (v *LocationValidator) IsValid(l Location) bool

IsValid returns whether the provided Location still refers to a place where the data corresponding to this blob may be retrieved.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL