README ¶
Noms Block Store
A horizontally-scalable storage backend for Noms.
Overview
NBS is a storage layer optimized for the needs of the Noms database.
NBS can run in two configurations: either backed by local disk, or backed by Amazon AWS.
When backed by local disk, NBS is significantly faster than LevelDB for our workloads and supports full multiprocess concurrency.
When backed by AWS, NBS stores its data mainly in S3, along with a single DynamoDB item. This configuration makes Noms "effectively CA", in the sense that Noms is always consistent, and Noms+NBS is as available as DynamoDB and S3 are. This configuration also gives Noms the cost profile of S3 with power closer to that of a traditional database.
Details
- NBS provides storage for a content-addressed DAG of nodes (with exactly one root), where each node is encoded as a sequence of bytes and addressed by a 20-byte hash of the byte-sequence.
- There is no
update
ordelete
-- onlyinsert
,update root
andgarbage collect
. - Insertion of any novel byte-sequence is durable only upon updating the root.
- File-level multiprocess concurrency is supported, with optimistic locking for multiple writers.
- Writers need not worry about re-writing duplicate chunks. NBS will efficiently detect and drop (most) duplicates.
Perf
For the file back-end, perf is substantially better than LevelDB mainly because LDB spends substantial IO with the goal of keeping KV pairs in key-order which doesn't benenfit Noms at all. NBS locates related chunks together and thus reading data from a NBS store can be done quite alot faster. As an example, storing & retrieving a 1.1GB MP4 video file on a MBP i5 2.9Ghz:
- LDB
- Initial import: 44 MB/s, size on disk: 1.1 GB.
- Import exact same bytes: 35 MB/s, size on disk: 1.4 GB.
- Export: 60 MB/s
- NBS
- Initial import: 72 MB/s, size on disk: 1.1 GB.
- Import exact same bytes: 92 MB/s, size on disk: 1.1GB.
- Export: 300 MB/s
Status
NBS is more-or-less "beta". There's still work we want to do, but it now works better than LevelDB for our purposes and so we have made it the default local backend for Noms:
# This uses nbs locally:
./csv-import foo.csv /Users/bob/csv-store::data
The AWS backend is available via the aws:
scheme:
./csv-import foo.csv aws://table:bucket::data
Documentation ¶
Index ¶
- Constants
- func NewAWSStoreFactory(sess *session.Session, table, bucket string, maxOpenFiles int, ...) chunks.Factory
- func NewLocalStoreFactory(dir string, indexCacheSize uint64, maxOpenFiles int) chunks.Factory
- func ParseAddr(b []byte) (h addr)
- func ValidateAddr(s string) bool
- type AWSStoreFactory
- type LocalStoreFactory
- type NomsBlockCache
- func (nbc *NomsBlockCache) Count() uint32
- func (nbc *NomsBlockCache) Destroy() error
- func (nbc *NomsBlockCache) ExtractChunks(chunkChan chan *chunks.Chunk)
- func (nbc *NomsBlockCache) Get(hash hash.Hash) chunks.Chunk
- func (nbc *NomsBlockCache) GetMany(hashes hash.HashSet, foundChunks chan *chunks.Chunk)
- func (nbc *NomsBlockCache) Has(hash hash.Hash) bool
- func (nbc *NomsBlockCache) HasMany(hashes hash.HashSet) hash.HashSet
- func (nbc *NomsBlockCache) Insert(c chunks.Chunk)
- type NomsBlockStore
- func (nbs *NomsBlockStore) CalcReads(hashes hash.HashSet, blockSize uint64) (reads int, split bool)
- func (nbs *NomsBlockStore) Close() (err error)
- func (nbs *NomsBlockStore) Commit(current, last hash.Hash) bool
- func (nbs *NomsBlockStore) Count() uint32
- func (nbs *NomsBlockStore) Get(h hash.Hash) chunks.Chunk
- func (nbs *NomsBlockStore) GetMany(hashes hash.HashSet, foundChunks chan *chunks.Chunk)
- func (nbs *NomsBlockStore) Has(h hash.Hash) bool
- func (nbs *NomsBlockStore) HasMany(hashes hash.HashSet) hash.HashSet
- func (nbs *NomsBlockStore) Put(c chunks.Chunk)
- func (nbs *NomsBlockStore) Rebase()
- func (nbs *NomsBlockStore) Root() hash.Hash
- func (nbs *NomsBlockStore) Stats() interface{}
- func (nbs *NomsBlockStore) StatsSummary() string
- func (nbs *NomsBlockStore) Version() string
- type Stats
Constants ¶
const (
// StorageVersion is the version of the on-disk Noms Chunks Store data format.
StorageVersion = "4"
)
Variables ¶
This section is empty.
Functions ¶
func NewAWSStoreFactory ¶
func NewAWSStoreFactory(sess *session.Session, table, bucket string, maxOpenFiles int, indexCacheSize, tableCacheSize uint64, tableCacheDir string) chunks.Factory
NewAWSStoreFactory returns a ChunkStore factory that vends NomsBlockStore instances that store manifests in the named DynamoDB table, and chunk data in the named S3 bucket. All connections to AWS services share |sess|.
func NewLocalStoreFactory ¶
func ValidateAddr ¶
Types ¶
type AWSStoreFactory ¶
type AWSStoreFactory struct {
// contains filtered or unexported fields
}
AWSStoreFactory vends NomsBlockStores built on top of DynamoDB and S3.
func (*AWSStoreFactory) CreateStore ¶
func (asf *AWSStoreFactory) CreateStore(ns string) chunks.ChunkStore
func (*AWSStoreFactory) CreateStoreFromCache ¶
func (asf *AWSStoreFactory) CreateStoreFromCache(ns string) chunks.ChunkStore
func (*AWSStoreFactory) Shutter ¶
func (asf *AWSStoreFactory) Shutter()
type LocalStoreFactory ¶
type LocalStoreFactory struct {
// contains filtered or unexported fields
}
func (*LocalStoreFactory) CreateStore ¶
func (lsf *LocalStoreFactory) CreateStore(ns string) chunks.ChunkStore
func (*LocalStoreFactory) CreateStoreFromCache ¶
func (lsf *LocalStoreFactory) CreateStoreFromCache(ns string) chunks.ChunkStore
func (*LocalStoreFactory) Shutter ¶
func (lsf *LocalStoreFactory) Shutter()
type NomsBlockCache ¶
type NomsBlockCache struct {
// contains filtered or unexported fields
}
NomsBlockCache holds Chunks, allowing them to be retrieved by hash or enumerated in hash order.
func NewCache ¶
func NewCache() *NomsBlockCache
func (*NomsBlockCache) Count ¶
func (nbc *NomsBlockCache) Count() uint32
Count returns the number of items in the cache.
func (*NomsBlockCache) Destroy ¶
func (nbc *NomsBlockCache) Destroy() error
Destroy drops the cache and deletes any backing storage.
func (*NomsBlockCache) ExtractChunks ¶
func (nbc *NomsBlockCache) ExtractChunks(chunkChan chan *chunks.Chunk)
ExtractChunks writes the entire contents of the cache to chunkChan. The chunks are extracted in insertion order.
func (*NomsBlockCache) Get ¶
func (nbc *NomsBlockCache) Get(hash hash.Hash) chunks.Chunk
Get retrieves the chunk referenced by hash. If the chunk is not present, Get returns the empty Chunk.
func (*NomsBlockCache) GetMany ¶
func (nbc *NomsBlockCache) GetMany(hashes hash.HashSet, foundChunks chan *chunks.Chunk)
GetMany gets the Chunks with |hashes| from the store. On return, |foundChunks| will have been fully sent all chunks which have been found. Any non-present chunks will silently be ignored.
func (*NomsBlockCache) Has ¶
func (nbc *NomsBlockCache) Has(hash hash.Hash) bool
Has checks if the chunk referenced by hash is in the cache.
func (*NomsBlockCache) HasMany ¶
func (nbc *NomsBlockCache) HasMany(hashes hash.HashSet) hash.HashSet
HasMany returns a set containing the members of hashes present in the cache.
func (*NomsBlockCache) Insert ¶
func (nbc *NomsBlockCache) Insert(c chunks.Chunk)
Insert stores c in the cache.
type NomsBlockStore ¶
type NomsBlockStore struct {
// contains filtered or unexported fields
}
func NewAWSStore ¶
func NewAWSStore(table, ns, bucket string, s3 s3svc, ddb ddbsvc, memTableSize uint64) *NomsBlockStore
func NewLocalStore ¶
func NewLocalStore(dir string, memTableSize uint64) *NomsBlockStore
func (*NomsBlockStore) Close ¶
func (nbs *NomsBlockStore) Close() (err error)
func (*NomsBlockStore) Count ¶
func (nbs *NomsBlockStore) Count() uint32
func (*NomsBlockStore) GetMany ¶
func (nbs *NomsBlockStore) GetMany(hashes hash.HashSet, foundChunks chan *chunks.Chunk)
func (*NomsBlockStore) HasMany ¶
func (nbs *NomsBlockStore) HasMany(hashes hash.HashSet) hash.HashSet
func (*NomsBlockStore) Put ¶
func (nbs *NomsBlockStore) Put(c chunks.Chunk)
func (*NomsBlockStore) Rebase ¶
func (nbs *NomsBlockStore) Rebase()
func (*NomsBlockStore) Root ¶
func (nbs *NomsBlockStore) Root() hash.Hash
func (*NomsBlockStore) Stats ¶
func (nbs *NomsBlockStore) Stats() interface{}
func (*NomsBlockStore) StatsSummary ¶
func (nbs *NomsBlockStore) StatsSummary() string
func (*NomsBlockStore) Version ¶
func (nbs *NomsBlockStore) Version() string
type Stats ¶
type Stats struct { OpenLatency metrics.Histogram CommitLatency metrics.Histogram IndexReadLatency metrics.Histogram IndexBytesPerRead metrics.Histogram GetLatency metrics.Histogram ChunksPerGet metrics.Histogram FileReadLatency metrics.Histogram FileBytesPerRead metrics.Histogram S3ReadLatency metrics.Histogram S3BytesPerRead metrics.Histogram MemReadLatency metrics.Histogram MemBytesPerRead metrics.Histogram DynamoReadLatency metrics.Histogram DynamoBytesPerRead metrics.Histogram HasLatency metrics.Histogram AddressesPerHas metrics.Histogram PutLatency metrics.Histogram PersistLatency metrics.Histogram BytesPerPersist metrics.Histogram ChunksPerPersist metrics.Histogram CompressedChunkBytesPerPersist metrics.Histogram UncompressedChunkBytesPerPersist metrics.Histogram ConjoinLatency metrics.Histogram BytesPerConjoin metrics.Histogram ChunksPerConjoin metrics.Histogram TablesPerConjoin metrics.Histogram ReadManifestLatency metrics.Histogram WriteManifestLatency metrics.Histogram }
Source Files ¶
- aws_chunk_source.go
- aws_table_persister.go
- cache.go
- conjoiner.go
- dynamo_manifest.go
- dynamo_table_reader.go
- factory.go
- fd_cache.go
- file_manifest.go
- file_table_persister.go
- fs_table_cache.go
- manifest.go
- manifest_cache.go
- mem_table.go
- mmap_table_reader.go
- persisting_chunk_source.go
- s3_table_reader.go
- stats.go
- store.go
- table.go
- table_persister.go
- table_reader.go
- table_set.go
- table_writer.go