Documentation ¶
Index ¶
- Constants
- func CheckOrdering(cmp Compare, format base.FormatKey, level Level, files LevelIterator, ...) error
- func LevelToInt(l Level) int
- func SortBySeqNum(files []*FileMetadata)
- func SortBySmallest(files []*FileMetadata, cmp Compare)
- type Annotator
- type BulkVersionEdit
- type CompactionState
- type Compare
- type DeletedFileEntry
- type FileBacking
- type FileMetadata
- func (m *FileMetadata) ContainedWithinSpan(cmp Compare, start, end []byte) bool
- func (m *FileMetadata) ContainsKeyType(kt KeyType) bool
- func (m *FileMetadata) DebugString(format base.FormatKey, verbose bool) string
- func (m *FileMetadata) ExtendPointKeyBounds(cmp Compare, smallest, largest InternalKey) *FileMetadata
- func (m *FileMetadata) ExtendRangeKeyBounds(cmp Compare, smallest, largest InternalKey) *FileMetadata
- func (m *FileMetadata) InitPhysicalBacking()
- func (m *FileMetadata) InitProviderBacking(fileNum base.DiskFileNum)
- func (m *FileMetadata) IsCompacting() bool
- func (m *FileMetadata) LargestBound(kt KeyType) (*InternalKey, bool)
- func (m *FileMetadata) LatestRef()
- func (m *FileMetadata) LatestRefs() int32
- func (m *FileMetadata) LatestUnref() int32
- func (m *FileMetadata) Overlaps(cmp Compare, start []byte, end []byte, exclusiveEnd bool) bool
- func (m *FileMetadata) PhysicalMeta() PhysicalFileMeta
- func (m *FileMetadata) Ref()
- func (m *FileMetadata) Refs() int32
- func (m *FileMetadata) SetCompactionState(to CompactionState)
- func (m *FileMetadata) SmallestBound(kt KeyType) (*InternalKey, bool)
- func (m *FileMetadata) StatsMarkValid()
- func (m *FileMetadata) StatsValid() bool
- func (m *FileMetadata) String() string
- func (m *FileMetadata) TableInfo() TableInfo
- func (m *FileMetadata) Unref() int32
- func (m *FileMetadata) Validate(cmp Compare, formatKey base.FormatKey) error
- func (m *FileMetadata) ValidateVirtual(createdFrom *FileMetadata)
- func (m *FileMetadata) VirtualMeta() VirtualFileMeta
- type InternalKey
- type KeyType
- type L0Compaction
- type L0CompactionFiles
- type L0Sublevels
- func (s *L0Sublevels) AddL0Files(files []*FileMetadata, flushSplitMaxBytes int64, levelMetadata *LevelMetadata) (*L0Sublevels, error)
- func (s *L0Sublevels) ExtendL0ForBaseCompactionTo(smallest, largest InternalKey, candidate *L0CompactionFiles) bool
- func (s *L0Sublevels) FlushSplitKeys() [][]byte
- func (s *L0Sublevels) InUseKeyRanges(smallest, largest []byte) []UserKeyRange
- func (s *L0Sublevels) InitCompactingFileInfo(inProgress []L0Compaction)
- func (s *L0Sublevels) MaxDepthAfterOngoingCompactions() int
- func (s *L0Sublevels) PickBaseCompaction(minCompactionDepth int, baseFiles LevelSlice) (*L0CompactionFiles, error)
- func (s *L0Sublevels) PickIntraL0Compaction(earliestUnflushedSeqNum uint64, minCompactionDepth int) (*L0CompactionFiles, error)
- func (s *L0Sublevels) ReadAmplification() int
- func (s *L0Sublevels) String() string
- func (s *L0Sublevels) UpdateStateForStartedCompaction(inputs []LevelSlice, isBase bool) error
- type Level
- type LevelFile
- type LevelIterator
- func (i *LevelIterator) Clone() LevelIterator
- func (i *LevelIterator) Current() *FileMetadata
- func (i *LevelIterator) Filter(keyType KeyType) LevelIterator
- func (i *LevelIterator) First() *FileMetadata
- func (i *LevelIterator) Last() *FileMetadata
- func (i *LevelIterator) Next() *FileMetadata
- func (i *LevelIterator) Prev() *FileMetadata
- func (i *LevelIterator) SeekGE(cmp Compare, userKey []byte) *FileMetadata
- func (i *LevelIterator) SeekLT(cmp Compare, userKey []byte) *FileMetadata
- func (i LevelIterator) String() string
- func (i *LevelIterator) Take() LevelFile
- type LevelMetadata
- func (lm *LevelMetadata) Annotation(annotator Annotator) interface{}
- func (lm *LevelMetadata) Empty() bool
- func (lm *LevelMetadata) Find(cmp base.Compare, m *FileMetadata) *LevelFile
- func (lm *LevelMetadata) InvalidateAnnotation(annotator Annotator)
- func (lm *LevelMetadata) Iter() LevelIterator
- func (lm *LevelMetadata) Len() int
- func (lm *LevelMetadata) Size() uint64
- func (lm *LevelMetadata) Slice() LevelSlice
- type LevelSlice
- func (ls LevelSlice) Each(fn func(*FileMetadata))
- func (ls *LevelSlice) Empty() bool
- func (ls *LevelSlice) Iter() LevelIterator
- func (ls *LevelSlice) Len() int
- func (ls *LevelSlice) NumVirtual() uint64
- func (ls LevelSlice) Reslice(resliceFunc func(start, end *LevelIterator)) LevelSlice
- func (ls *LevelSlice) SizeSum() uint64
- func (ls LevelSlice) String() string
- func (ls *LevelSlice) VirtualSizeSum() uint64
- type NewFileEntry
- type OrderingInvariants
- type PhysicalFileMeta
- type TableInfo
- type TableStats
- type UserKeyRange
- type Version
- func AccumulateIncompleteAndApplySingleVE(ve *VersionEdit, curr *Version, cmp Compare, formatKey base.FormatKey, ...) (_ *Version, zombies map[base.DiskFileNum]uint64, _ error)
- func NewVersion(cmp Compare, formatKey base.FormatKey, flushSplitBytes int64, ...) *Version
- func ParseVersionDebug(cmp Compare, formatKey base.FormatKey, flushSplitBytes int64, s string) (*Version, error)
- func (v *Version) CheckOrdering(cmp Compare, format base.FormatKey, order OrderingInvariants) error
- func (v *Version) Contains(level int, cmp Compare, m *FileMetadata) bool
- func (v *Version) DebugString(format base.FormatKey) string
- func (v *Version) InitL0Sublevels(cmp Compare, formatKey base.FormatKey, flushSplitBytes int64) error
- func (v *Version) Next() *Version
- func (v *Version) Overlaps(level int, cmp Compare, start, end []byte, exclusiveEnd bool) LevelSlice
- func (v *Version) Ref()
- func (v *Version) Refs() int32
- func (v *Version) String() string
- func (v *Version) Unref()
- func (v *Version) UnrefLocked()
- type VersionEdit
- type VersionList
- type VirtualFileMeta
Constants ¶
const NumLevels = 7
NumLevels is the number of levels a Version contains.
Variables ¶
This section is empty.
Functions ¶
func CheckOrdering ¶
func CheckOrdering( cmp Compare, format base.FormatKey, level Level, files LevelIterator, ordering OrderingInvariants, ) error
CheckOrdering checks that the files are consistent with respect to seqnums (for level 0 files -- see detailed comment below) and increasing and non- overlapping internal key ranges (for non-level 0 files).
The ordering field may be passed AllowSplitUserKeys to allow adjacent files that are both inclusive of the same user key. Pebble no longer creates version edits installing such files, and Pebble databases with sufficiently high format major version should no longer have any such files within their LSM. TODO(jackson): Remove AllowSplitUserKeys when we remove support for the earlier format major versions.
func SortBySeqNum ¶
func SortBySeqNum(files []*FileMetadata)
SortBySeqNum sorts the specified files by increasing sequence number.
func SortBySmallest ¶
func SortBySmallest(files []*FileMetadata, cmp Compare)
SortBySmallest sorts the specified files by smallest key using the supplied comparison function to order user keys.
Types ¶
type Annotator ¶
type Annotator interface { // Zero returns the zero value of an annotation. This value is returned // when a LevelMetadata is empty. The dst argument, if non-nil, is an // obsolete value previously returned by this Annotator and may be // overwritten and reused to avoid a memory allocation. Zero(dst interface{}) (v interface{}) // Accumulate computes the annotation for a single file in a level's // metadata. It merges the file's value into dst and returns a bool flag // indicating whether or not the value is stable and okay to cache as an // annotation. If the file's value may change over the life of the file, // the annotator must return false. // // Implementations may modify dst and return it to avoid an allocation. Accumulate(m *FileMetadata, dst interface{}) (v interface{}, cacheOK bool) // Merge combines two values src and dst, returning the result. // Implementations may modify dst and return it to avoid an allocation. Merge(src interface{}, dst interface{}) interface{} }
An Annotator defines a computation over a level's FileMetadata. If the computation is stable and uses inputs that are fixed for the lifetime of a FileMetadata, the LevelMetadata's internal data structures are annotated with the intermediary computations. This allows the computation to be computed incrementally as edits are applied to a level.
type BulkVersionEdit ¶
type BulkVersionEdit struct { Added [NumLevels]map[base.FileNum]*FileMetadata Deleted [NumLevels]map[base.FileNum]*FileMetadata // AddedFileBacking is a map to support lookup so that we can populate the // FileBacking of virtual sstables during manifest replay. AddedFileBacking map[base.DiskFileNum]*FileBacking RemovedFileBacking []base.DiskFileNum // AddedByFileNum maps file number to file metadata for all added files // from accumulated version edits. AddedByFileNum is only populated if set // to non-nil by a caller. It must be set to non-nil when replaying // version edits read from a MANIFEST (as opposed to VersionEdits // constructed in-memory). While replaying a MANIFEST file, // VersionEdit.DeletedFiles map entries have nil values, because the // on-disk deletion record encodes only the file number. Accumulate // uses AddedByFileNum to correctly populate the BulkVersionEdit's Deleted // field with non-nil *FileMetadata. AddedByFileNum map[base.FileNum]*FileMetadata // MarkedForCompactionCountDiff holds the aggregated count of files // marked for compaction added or removed. MarkedForCompactionCountDiff int }
BulkVersionEdit summarizes the files added and deleted from a set of version edits.
INVARIANTS: No file can be added to a level more than once. This is true globally, and also true for all of the calls to Accumulate for a single bulk version edit.
No file can be removed from a level more than once. This is true globally, and also true for all of the calls to Accumulate for a single bulk version edit.
A file must not be added and removed from a given level in the same version edit.
A file that is being removed from a level must have been added to that level before (in a prior version edit). Note that a given file can be deleted from a level and added to another level in a single version edit
func (*BulkVersionEdit) Accumulate ¶
func (b *BulkVersionEdit) Accumulate(ve *VersionEdit) error
Accumulate adds the file addition and deletions in the specified version edit to the bulk edit's internal state.
INVARIANTS: If a file is added to a given level in a call to Accumulate and then removed from that level in a subsequent call, the file will not be present in the resulting BulkVersionEdit.Deleted for that level.
After accumulation of version edits, the bulk version edit may have information about a file which has been deleted from a level, but it may not have information about the same file added to the same level. The add could've occurred as part of a previous bulk version edit. In this case, the deleted file must be present in BulkVersionEdit.Deleted, at the end of the accumulation, because we need to decrease the refcount of the deleted file in Apply.
func (*BulkVersionEdit) Apply ¶
func (b *BulkVersionEdit) Apply( curr *Version, cmp Compare, formatKey base.FormatKey, flushSplitBytes int64, readCompactionRate int64, zombies map[base.DiskFileNum]uint64, orderingInvariants OrderingInvariants, ) (*Version, error)
Apply applies the delta b to the current version to produce a new version. The new version is consistent with respect to the comparer cmp.
curr may be nil, which is equivalent to a pointer to a zero version.
On success, if a non-nil zombies map is provided to Apply, the map is updated with file numbers and files sizes of deleted files. These files are considered zombies because they are no longer referenced by the returned Version, but cannot be deleted from disk as they are still in use by the incoming Version.
type CompactionState ¶
type CompactionState uint8
CompactionState is the compaction state of a file.
The following shows the valid state transitions:
NotCompacting --> Compacting --> Compacted ^ | | | +-------<-------+
Input files to a compaction transition to Compacting when a compaction is picked. A file that has finished compacting typically transitions into the Compacted state, at which point it is effectively obsolete ("zombied") and will eventually be removed from the LSM. A file that has been move-compacted will transition from Compacting back into the NotCompacting state, signaling that the file may be selected for a subsequent compaction. A failed compaction will result in all input tables transitioning from Compacting to NotCompacting.
This state is in-memory only. It is not persisted to the manifest.
const ( CompactionStateNotCompacting CompactionState = iota CompactionStateCompacting CompactionStateCompacted )
CompactionStates.
func (CompactionState) String ¶
func (s CompactionState) String() string
String implements fmt.Stringer.
type DeletedFileEntry ¶
DeletedFileEntry holds the state for a file deletion from a level. The file itself might still be referenced by another level.
type FileBacking ¶ added in v1.1.0
type FileBacking struct { // VirtualizedSize is set iff the backing sst is only referred to by // virtual ssts in the latest version. VirtualizedSize is the sum of the // virtual sstable sizes of all of the virtual sstables in the latest // version which are backed by the physical sstable. When a virtual // sstable is removed from the latest version, we will decrement the // VirtualizedSize. During compaction picking, we'll compensate a // virtual sstable file size by // (FileBacking.Size - FileBacking.VirtualizedSize) / latestVersionRefs. // The intuition is that if FileBacking.Size - FileBacking.VirtualizedSize // is high, then the space amplification due to virtual sstables is // high, and we should pick the virtual sstable with a higher priority. // // TODO(bananabrick): Compensate the virtual sstable file size using // the VirtualizedSize during compaction picking and test. VirtualizedSize atomic.Uint64 DiskFileNum base.DiskFileNum Size uint64 // contains filtered or unexported fields }
FileBacking either backs a single physical sstable, or one or more virtual sstables.
See the comment above the FileMetadata type for sstable terminology.
type FileMetadata ¶
type FileMetadata struct { // AllowedSeeks is used to determine if a file should be picked for // a read triggered compaction. It is decremented when read sampling // in pebble.Iterator after every after every positioning operation // that returns a user key (eg. Next, Prev, SeekGE, SeekLT, etc). AllowedSeeks atomic.Int64 // FileBacking is the state which backs either a physical or virtual // sstables. FileBacking *FileBacking // InitAllowedSeeks is the inital value of allowed seeks. This is used // to re-set allowed seeks on a file once it hits 0. InitAllowedSeeks int64 // FileNum is the file number. // // INVARIANT: when !FileMetadata.Virtual, FileNum == FileBacking.DiskFileNum. FileNum base.FileNum // Size is the size of the file, in bytes. Size is an approximate value for // virtual sstables. // // INVARIANTS: // - When !FileMetadata.Virtual, Size == FileBacking.Size. // - Size should be non-zero. Size 0 virtual sstables must not be created. Size uint64 // File creation time in seconds since the epoch (1970-01-01 00:00:00 // UTC). For ingested sstables, this corresponds to the time the file was // ingested. For virtual sstables, this corresponds to the wall clock time // when the FileMetadata for the virtual sstable was first created. CreationTime int64 // Lower and upper bounds for the smallest and largest sequence numbers in // the table, across both point and range keys. For physical sstables, these // values are tight bounds. For virtual sstables, there is no guarantee that // there will be keys with SmallestSeqNum or LargestSeqNum within virtual // sstable bounds. SmallestSeqNum uint64 LargestSeqNum uint64 // SmallestPointKey and LargestPointKey are the inclusive bounds for the // internal point keys stored in the table. This includes RANGEDELs, which // alter point keys. // NB: these field should be set using ExtendPointKeyBounds. They are left // exported for reads as an optimization. SmallestPointKey InternalKey LargestPointKey InternalKey // SmallestRangeKey and LargestRangeKey are the inclusive bounds for the // internal range keys stored in the table. // NB: these field should be set using ExtendRangeKeyBounds. They are left // exported for reads as an optimization. SmallestRangeKey InternalKey LargestRangeKey InternalKey // Smallest and Largest are the inclusive bounds for the internal keys stored // in the table, across both point and range keys. // NB: these fields are derived from their point and range key equivalents, // and are updated via the MaybeExtend{Point,Range}KeyBounds methods. Smallest InternalKey Largest InternalKey // Stats describe table statistics. Protected by DB.mu. // // For virtual sstables, set stats upon virtual sstable creation as // asynchronous computation of stats is not currently supported. // // TODO(bananabrick): To support manifest replay for virtual sstables, we // probably need to compute virtual sstable stats asynchronously. Otherwise, // we'd have to write virtual sstable stats to the version edit. Stats TableStats // For L0 files only. Protected by DB.mu. Used to generate L0 sublevels and // pick L0 compactions. Only accurate for the most recent Version. SubLevel int L0Index int // IsIntraL0Compacting is set to True if this file is part of an intra-L0 // compaction. When it's true, IsCompacting must also return true. If // Compacting is true and IsIntraL0Compacting is false for an L0 file, the // file must be part of a compaction to Lbase. IsIntraL0Compacting bool CompactionState CompactionState // True if compaction of this file has been explicitly requested. // Previously, RocksDB and earlier versions of Pebble allowed this // flag to be set by a user table property collector. Some earlier // versions of Pebble respected this flag, while other more recent // versions ignored this flag. // // More recently this flag has been repurposed to facilitate the // compaction of 'atomic compaction units'. Files marked for // compaction are compacted in a rewrite compaction at the lowest // possible compaction priority. // // NB: A count of files marked for compaction is maintained on // Version, and compaction picking reads cached annotations // determined by this field. // // Protected by DB.mu. MarkedForCompaction bool // HasPointKeys tracks whether the table contains point keys (including // RANGEDELs). If a table contains only range deletions, HasPointsKeys is // still true. HasPointKeys bool // HasRangeKeys tracks whether the table contains any range keys. HasRangeKeys bool // Virtual is true if the FileMetadata belongs to a virtual sstable. Virtual bool // contains filtered or unexported fields }
FileMetadata is maintained for leveled-ssts, i.e., they belong to a level of some version. FileMetadata does not contain the actual level of the sst, since such leveled-ssts can move across levels in different versions, while sharing the same FileMetadata. There are two kinds of leveled-ssts, physical and virtual. Underlying both leveled-ssts is a backing-sst, for which the only state is FileBacking. A backing-sst is level-less. It is possible for a backing-sst to be referred to by a physical sst in one version and by one or more virtual ssts in one or more versions. A backing-sst becomes obsolete and can be deleted once it is no longer required by any physical or virtual sst in any version.
We maintain some invariants:
Each physical and virtual sst will have a unique FileMetadata.FileNum, and there will be exactly one FileMetadata associated with the FileNum.
Within a version, a backing-sst is either only referred to by one physical sst or one or more virtual ssts.
Once a backing-sst is referred to by a virtual sst in the latest version, it cannot go back to being referred to by a physical sst in any future version.
Once a physical sst is no longer needed by any version, we will no longer maintain the file metadata associated with it. We will still maintain the FileBacking associated with the physical sst if the backing sst is required by any virtual ssts in any version.
func ParseFileMetadataDebug ¶
func ParseFileMetadataDebug(s string) (*FileMetadata, error)
ParseFileMetadataDebug parses a FileMetadata from its DebugString representation.
func (*FileMetadata) ContainedWithinSpan ¶ added in v1.1.0
func (m *FileMetadata) ContainedWithinSpan(cmp Compare, start, end []byte) bool
ContainedWithinSpan returns true if the file key range completely overlaps with the given range ("end" is assumed to exclusive).
func (*FileMetadata) ContainsKeyType ¶
func (m *FileMetadata) ContainsKeyType(kt KeyType) bool
ContainsKeyType returns whether or not the file contains keys of the provided type.
func (*FileMetadata) DebugString ¶
func (m *FileMetadata) DebugString(format base.FormatKey, verbose bool) string
DebugString returns a verbose representation of FileMetadata, typically for use in tests and debugging, returning the file number and the point, range and overall bounds for the table.
func (*FileMetadata) ExtendPointKeyBounds ¶
func (m *FileMetadata) ExtendPointKeyBounds( cmp Compare, smallest, largest InternalKey, ) *FileMetadata
ExtendPointKeyBounds attempts to extend the lower and upper point key bounds and overall table bounds with the given smallest and largest keys. The smallest and largest bounds may not be extended if the table already has a bound that is smaller or larger, respectively. The receiver is returned. NB: calling this method should be preferred to manually setting the bounds by manipulating the fields directly, to maintain certain invariants.
func (*FileMetadata) ExtendRangeKeyBounds ¶
func (m *FileMetadata) ExtendRangeKeyBounds( cmp Compare, smallest, largest InternalKey, ) *FileMetadata
ExtendRangeKeyBounds attempts to extend the lower and upper range key bounds and overall table bounds with the given smallest and largest keys. The smallest and largest bounds may not be extended if the table already has a bound that is smaller or larger, respectively. The receiver is returned. NB: calling this method should be preferred to manually setting the bounds by manipulating the fields directly, to maintain certain invariants.
func (*FileMetadata) InitPhysicalBacking ¶ added in v1.1.0
func (m *FileMetadata) InitPhysicalBacking()
InitPhysicalBacking allocates and sets the FileBacking which is required by a physical sstable FileMetadata.
Ensure that the state required by FileBacking, such as the FileNum, is already set on the FileMetadata before InitPhysicalBacking is called. Calling InitPhysicalBacking only after the relevant state has been set in the FileMetadata is not necessary in tests which don't rely on FileBacking.
func (*FileMetadata) InitProviderBacking ¶ added in v1.1.0
func (m *FileMetadata) InitProviderBacking(fileNum base.DiskFileNum)
InitProviderBacking creates a new FileBacking for a file backed by an objstorage.Provider.
func (*FileMetadata) IsCompacting ¶
func (m *FileMetadata) IsCompacting() bool
IsCompacting returns true if this file's compaction state is CompactionStateCompacting. Protected by DB.mu.
func (*FileMetadata) LargestBound ¶
func (m *FileMetadata) LargestBound(kt KeyType) (*InternalKey, bool)
LargestBound returns the file's largest bound of the key type. It returns a false second return value if the file does not contain any keys of the key type.
func (*FileMetadata) LatestRef ¶ added in v1.1.0
func (m *FileMetadata) LatestRef()
LatestRef increments the latest ref count associated with the backing sstable.
func (*FileMetadata) LatestRefs ¶ added in v1.1.0
func (m *FileMetadata) LatestRefs() int32
LatestRefs returns the latest ref count associated with the backing sstable.
func (*FileMetadata) LatestUnref ¶ added in v1.1.0
func (m *FileMetadata) LatestUnref() int32
LatestUnref decrements the latest ref count associated with the backing sstable.
func (*FileMetadata) Overlaps ¶
Overlaps returns true if the file key range overlaps with the given range.
func (*FileMetadata) PhysicalMeta ¶ added in v1.1.0
func (m *FileMetadata) PhysicalMeta() PhysicalFileMeta
PhysicalMeta should be the only source of creating the PhysicalFileMeta wrapper type.
func (*FileMetadata) Ref ¶ added in v1.1.0
func (m *FileMetadata) Ref()
Ref increments the ref count associated with the backing sstable.
func (*FileMetadata) Refs ¶
func (m *FileMetadata) Refs() int32
Refs returns the refcount of backing sstable.
func (*FileMetadata) SetCompactionState ¶
func (m *FileMetadata) SetCompactionState(to CompactionState)
SetCompactionState transitions this file's compaction state to the given state. Protected by DB.mu.
func (*FileMetadata) SmallestBound ¶
func (m *FileMetadata) SmallestBound(kt KeyType) (*InternalKey, bool)
SmallestBound returns the file's smallest bound of the key type. It returns a false second return value if the file does not contain any keys of the key type.
func (*FileMetadata) StatsMarkValid ¶
func (m *FileMetadata) StatsMarkValid()
StatsMarkValid marks the TableStats as valid. The caller must hold DB.mu while populating TableStats and calling StatsMarkValud. Once stats are populated, they must not be mutated.
func (*FileMetadata) StatsValid ¶
func (m *FileMetadata) StatsValid() bool
StatsValid returns true if the table stats have been populated. If StatValid returns true, the Stats field may be read (with or without holding the database mutex).
func (*FileMetadata) String ¶
func (m *FileMetadata) String() string
String implements fmt.Stringer, printing the file number and the overall table bounds.
func (*FileMetadata) TableInfo ¶
func (m *FileMetadata) TableInfo() TableInfo
TableInfo returns a subset of the FileMetadata state formatted as a TableInfo.
func (*FileMetadata) Unref ¶ added in v1.1.0
func (m *FileMetadata) Unref() int32
Unref decrements the ref count associated with the backing sstable.
func (*FileMetadata) Validate ¶
func (m *FileMetadata) Validate(cmp Compare, formatKey base.FormatKey) error
Validate validates the metadata for consistency with itself, returning an error if inconsistent.
func (*FileMetadata) ValidateVirtual ¶ added in v1.1.0
func (m *FileMetadata) ValidateVirtual(createdFrom *FileMetadata)
ValidateVirtual should be called once the FileMetadata for a virtual sstable is created to verify that the fields of the virtual sstable are sound.
func (*FileMetadata) VirtualMeta ¶ added in v1.1.0
func (m *FileMetadata) VirtualMeta() VirtualFileMeta
VirtualMeta should be the only source of creating the VirtualFileMeta wrapper type.
type InternalKey ¶
type InternalKey = base.InternalKey
InternalKey exports the base.InternalKey type.
func KeyRange ¶
func KeyRange(ucmp Compare, iters ...LevelIterator) (smallest, largest InternalKey)
KeyRange returns the minimum smallest and maximum largest internalKey for all the FileMetadata in iters.
type KeyType ¶
type KeyType int8
KeyType is used to specify the type of keys we're looking for in LevelIterator positioning operations. Files not containing any keys of the desired type are skipped.
const ( // KeyTypePointAndRange denotes a search among the entire keyspace, including // both point keys and range keys. No sstables are skipped. KeyTypePointAndRange KeyType = iota // KeyTypePoint denotes a search among the point keyspace. SSTables with no // point keys will be skipped. Note that the point keyspace includes rangedels. KeyTypePoint // KeyTypeRange denotes a search among the range keyspace. SSTables with no // range keys will be skipped. KeyTypeRange )
type L0Compaction ¶
type L0Compaction struct { Smallest InternalKey Largest InternalKey IsIntraL0 bool }
L0Compaction describes an active compaction with inputs from L0.
type L0CompactionFiles ¶
type L0CompactionFiles struct { Files []*FileMetadata FilesIncluded bitSet // contains filtered or unexported fields }
L0CompactionFiles represents a candidate set of L0 files for compaction. Also referred to as "lcf". Contains state information useful for generating the compaction (such as Files), as well as for picking between candidate compactions (eg. fileBytes and seedIntervalStackDepthReduction).
func (*L0CompactionFiles) Clone ¶
func (l *L0CompactionFiles) Clone() *L0CompactionFiles
Clone allocates a new L0CompactionFiles, with the same underlying data. Note that the two fileMetadata slices contain values that point to the same underlying fileMetadata object. This is safe because these objects are read only.
func (*L0CompactionFiles) String ¶
func (l *L0CompactionFiles) String() string
String merely prints the starting address of the first file, if it exists.
type L0Sublevels ¶
type L0Sublevels struct { // Levels are ordered from oldest sublevel to youngest sublevel in the // outer slice, and the inner slice contains non-overlapping files for // that sublevel in increasing key order. Levels is constructed from // levelFiles and is used by callers that require a LevelSlice. The below two // fields are treated as immutable once created in NewL0Sublevels. Levels []LevelSlice // contains filtered or unexported fields }
L0Sublevels represents a sublevel view of SSTables in L0. Tables in one sublevel are non-overlapping in key ranges, and keys in higher-indexed sublevels shadow older versions in lower-indexed sublevels. These invariants are similar to the regular level invariants, except with higher indexed sublevels having newer keys as opposed to lower indexed levels.
There is no limit to the number of sublevels that can exist in L0 at any time, however read and compaction performance is best when there are as few sublevels as possible.
func NewL0Sublevels ¶
func NewL0Sublevels( levelMetadata *LevelMetadata, cmp Compare, formatKey base.FormatKey, flushSplitMaxBytes int64, ) (*L0Sublevels, error)
NewL0Sublevels creates an L0Sublevels instance for a given set of L0 files. These files must all be in L0 and must be sorted by seqnum (see SortBySeqNum). During interval iteration, when flushSplitMaxBytes bytes are exceeded in the range of intervals since the last flush split key, a flush split key is added.
This method can be called without DB.mu being held, so any DB.mu protected fields in FileMetadata cannot be accessed here, such as Compacting and IsIntraL0Compacting. Those fields are accessed in InitCompactingFileInfo instead.
func (*L0Sublevels) AddL0Files ¶
func (s *L0Sublevels) AddL0Files( files []*FileMetadata, flushSplitMaxBytes int64, levelMetadata *LevelMetadata, ) (*L0Sublevels, error)
AddL0Files incrementally builds a new L0Sublevels for when the only change since the receiver L0Sublevels was an addition of the specified files, with no L0 deletions. The common case of this is an ingestion or a flush. These files can "sit on top" of existing sublevels, creating at most one new sublevel for a flush (and possibly multiple for an ingestion), and at most 2*len(files) additions to s.orderedIntervals. No files must have been deleted from L0, and the added files must all be newer in sequence numbers than existing files in L0Sublevels. The files parameter must be sorted in seqnum order. The levelMetadata parameter corresponds to the new L0 post addition of files. This method is meant to be significantly more performant than NewL0Sublevels.
Note that this function can only be called once on a given receiver; it appends to some slices in s which is only safe when done once. This is okay, as the common case (generating a new L0Sublevels after a flush/ingestion) is only going to necessitate one call of this method on a given receiver. The returned value, if non-nil, can then have *L0Sublevels.AddL0Files called on it again, and so on. If [errInvalidL0SublevelsOpt] is returned as an error, it likely means the optimization could not be applied (i.e. files added were older than files already in the sublevels, which is possible around ingestions and in tests). Eg. it can happen when an ingested file was ingested without queueing a flush since it did not actually overlap with any keys in the memtable. Later on the memtable was flushed, and the memtable had keys spanning around the ingested file, producing a flushed file that overlapped with the ingested file in file bounds but not in keys. It's possible for that flushed file to have a lower LargestSeqNum than the ingested file if all the additions after the ingestion were to another flushed file that was split into a separate sstable during flush. Any other non-nil error means L0Sublevels generation failed in the same way as NewL0Sublevels would likely fail.
func (*L0Sublevels) ExtendL0ForBaseCompactionTo ¶
func (s *L0Sublevels) ExtendL0ForBaseCompactionTo( smallest, largest InternalKey, candidate *L0CompactionFiles, ) bool
ExtendL0ForBaseCompactionTo extends the specified base compaction candidate L0CompactionFiles to optionally cover more files in L0 without "touching" any of the passed-in keys (i.e. the smallest/largest bounds are exclusive), as including any user keys for those internal keys could require choosing more files in LBase which is undesirable. Unbounded start/end keys are indicated by passing in the InvalidInternalKey.
func (*L0Sublevels) FlushSplitKeys ¶
func (s *L0Sublevels) FlushSplitKeys() [][]byte
FlushSplitKeys returns a slice of user keys to split flushes at. Used by flushes to avoid writing sstables that straddle these split keys. These should be interpreted as the keys to start the next sstable (not the last key to include in the prev sstable). These are user keys so that range tombstones can be properly truncated (untruncated range tombstones are not permitted for L0 files).
func (*L0Sublevels) InUseKeyRanges ¶
func (s *L0Sublevels) InUseKeyRanges(smallest, largest []byte) []UserKeyRange
InUseKeyRanges returns the merged table bounds of L0 files overlapping the provided user key range. The returned key ranges are sorted and nonoverlapping.
func (*L0Sublevels) InitCompactingFileInfo ¶
func (s *L0Sublevels) InitCompactingFileInfo(inProgress []L0Compaction)
InitCompactingFileInfo initializes internal flags relating to compacting files. Must be called after sublevel initialization.
Requires DB.mu *and* the manifest lock to be held.
func (*L0Sublevels) MaxDepthAfterOngoingCompactions ¶
func (s *L0Sublevels) MaxDepthAfterOngoingCompactions() int
MaxDepthAfterOngoingCompactions returns an estimate of maximum depth of sublevels after all ongoing compactions run to completion. Used by compaction picker to decide compaction score for L0. There is no scoring for intra-L0 compactions -- they only run if L0 score is high but we're unable to pick an L0 -> Lbase compaction.
func (*L0Sublevels) PickBaseCompaction ¶
func (s *L0Sublevels) PickBaseCompaction( minCompactionDepth int, baseFiles LevelSlice, ) (*L0CompactionFiles, error)
PickBaseCompaction picks a base compaction based on the above specified heuristics, for the specified Lbase files and a minimum depth of overlapping files that can be selected for compaction. Returns nil if no compaction is possible.
func (*L0Sublevels) PickIntraL0Compaction ¶
func (s *L0Sublevels) PickIntraL0Compaction( earliestUnflushedSeqNum uint64, minCompactionDepth int, ) (*L0CompactionFiles, error)
PickIntraL0Compaction picks an intra-L0 compaction for files in this sublevel. This method is only called when a base compaction cannot be chosen. See comment above [PickBaseCompaction] for heuristics involved in this selection.
func (*L0Sublevels) ReadAmplification ¶
func (s *L0Sublevels) ReadAmplification() int
ReadAmplification returns the contribution of L0Sublevels to the read amplification for any particular point key. It is the maximum height of any tracked fileInterval. This is always less than or equal to the number of sublevels.
func (*L0Sublevels) String ¶
func (s *L0Sublevels) String() string
String produces a string containing useful debug information. Useful in test code and debugging.
func (*L0Sublevels) UpdateStateForStartedCompaction ¶
func (s *L0Sublevels) UpdateStateForStartedCompaction(inputs []LevelSlice, isBase bool) error
UpdateStateForStartedCompaction updates internal L0Sublevels state for a recently started compaction. isBase specifies if this is a base compaction; if false, this is assumed to be an intra-L0 compaction. The specified compaction must be involving L0 SSTables. It's assumed that the Compacting and IsIntraL0Compacting fields are already set on all [FileMetadata]s passed in.
type Level ¶
type Level uint32
Level encodes a level and optional sublevel for use in log and error messages. The encoding has the property that Level(0) == L0Sublevel(invalidSublevel).
func L0Sublevel ¶
L0Sublevel returns a Level representing the specified L0 sublevel.
type LevelFile ¶
type LevelFile struct { *FileMetadata // contains filtered or unexported fields }
LevelFile holds a file's metadata along with its position within a level of the LSM.
func (LevelFile) Slice ¶
func (lf LevelFile) Slice() LevelSlice
Slice constructs a LevelSlice containing only this file.
type LevelIterator ¶
type LevelIterator struct {
// contains filtered or unexported fields
}
LevelIterator iterates over a set of files' metadata. Its zero value is an empty iterator.
func (*LevelIterator) Clone ¶
func (i *LevelIterator) Clone() LevelIterator
Clone copies the iterator, returning an independent iterator at the same position.
func (*LevelIterator) Current ¶
func (i *LevelIterator) Current() *FileMetadata
Current returns the item at the current iterator position.
Current is deprecated. Callers should instead use the return value of a positioning operation.
func (*LevelIterator) Filter ¶
func (i *LevelIterator) Filter(keyType KeyType) LevelIterator
Filter clones the iterator and sets the desired KeyType as the key to filter files on.
func (*LevelIterator) First ¶
func (i *LevelIterator) First() *FileMetadata
First seeks to the first file in the iterator and returns it.
func (*LevelIterator) Last ¶
func (i *LevelIterator) Last() *FileMetadata
Last seeks to the last file in the iterator and returns it.
func (*LevelIterator) Next ¶
func (i *LevelIterator) Next() *FileMetadata
Next advances the iterator to the next file and returns it.
func (*LevelIterator) Prev ¶
func (i *LevelIterator) Prev() *FileMetadata
Prev moves the iterator the previous file and returns it.
func (*LevelIterator) SeekGE ¶
func (i *LevelIterator) SeekGE(cmp Compare, userKey []byte) *FileMetadata
SeekGE seeks to the first file in the iterator's file set with a largest user key greater than or equal to the provided user key. The iterator must have been constructed from L1+, because it requires the underlying files to be sorted by user keys and non-overlapping.
func (*LevelIterator) SeekLT ¶
func (i *LevelIterator) SeekLT(cmp Compare, userKey []byte) *FileMetadata
SeekLT seeks to the last file in the iterator's file set with a smallest user key less than the provided user key. The iterator must have been constructed from L1+, because it requires the underlying files to be sorted by user keys and non-overlapping.
func (LevelIterator) String ¶
func (i LevelIterator) String() string
func (*LevelIterator) Take ¶
func (i *LevelIterator) Take() LevelFile
Take constructs a LevelFile containing the file at the iterator's current position. Take panics if the iterator is not currently positioned over a file.
type LevelMetadata ¶
type LevelMetadata struct { // NumVirtual is the number of virtual sstables in the level. NumVirtual uint64 // VirtualSize is the size of the virtual sstables in the level. VirtualSize uint64 // contains filtered or unexported fields }
LevelMetadata contains metadata for all of the files within a level of the LSM.
func (*LevelMetadata) Annotation ¶
func (lm *LevelMetadata) Annotation(annotator Annotator) interface{}
Annotation lazily calculates and returns the annotation defined by Annotator. The Annotator is used as the key for pre-calculated values, so equal Annotators must be used to avoid duplicate computations and cached annotations. Annotation must not be called concurrently, and in practice this is achieved by requiring callers to hold DB.mu.
func (*LevelMetadata) Empty ¶
func (lm *LevelMetadata) Empty() bool
Empty indicates whether there are any files in the level.
func (*LevelMetadata) Find ¶
func (lm *LevelMetadata) Find(cmp base.Compare, m *FileMetadata) *LevelFile
Find finds the provided file in the level if it exists.
func (*LevelMetadata) InvalidateAnnotation ¶
func (lm *LevelMetadata) InvalidateAnnotation(annotator Annotator)
InvalidateAnnotation clears any cached annotations defined by Annotator. The Annotator is used as the key for pre-calculated values, so equal Annotators must be used to clear the appropriate cached annotation. InvalidateAnnotation must not be called concurrently, and in practice this is achieved by requiring callers to hold DB.mu.
func (*LevelMetadata) Iter ¶
func (lm *LevelMetadata) Iter() LevelIterator
Iter constructs a LevelIterator over the entire level.
func (*LevelMetadata) Len ¶
func (lm *LevelMetadata) Len() int
Len returns the number of files within the level.
func (*LevelMetadata) Size ¶ added in v1.1.0
func (lm *LevelMetadata) Size() uint64
Size returns the cumulative size of all the files within the level.
func (*LevelMetadata) Slice ¶
func (lm *LevelMetadata) Slice() LevelSlice
Slice constructs a slice containing the entire level.
type LevelSlice ¶
type LevelSlice struct {
// contains filtered or unexported fields
}
LevelSlice contains a slice of the files within a level of the LSM. A LevelSlice is immutable once created, but may be used to construct a mutable LevelIterator over the slice's files.
LevelSlices should be constructed through one of the existing constructors, not manually initialized.
func NewLevelSliceKeySorted ¶
func NewLevelSliceKeySorted(cmp base.Compare, files []*FileMetadata) LevelSlice
NewLevelSliceKeySorted constructs a LevelSlice over the provided files, sorted by the files smallest keys. TODO(jackson): Can we improve this interface or avoid needing to export a slice constructor like this?
func NewLevelSliceSeqSorted ¶
func NewLevelSliceSeqSorted(files []*FileMetadata) LevelSlice
NewLevelSliceSeqSorted constructs a LevelSlice over the provided files, sorted by the L0 sequence number sort order. TODO(jackson): Can we improve this interface or avoid needing to export a slice constructor like this?
func NewLevelSliceSpecificOrder ¶
func NewLevelSliceSpecificOrder(files []*FileMetadata) LevelSlice
NewLevelSliceSpecificOrder constructs a LevelSlice over the provided files, ordering the files by their order in the provided slice. It's used in tests. TODO(jackson): Update tests to avoid requiring this and remove it.
func (LevelSlice) Each ¶
func (ls LevelSlice) Each(fn func(*FileMetadata))
Each invokes fn for each element in the slice.
func (*LevelSlice) Empty ¶
func (ls *LevelSlice) Empty() bool
Empty indicates whether the slice contains any files.
func (*LevelSlice) Iter ¶
func (ls *LevelSlice) Iter() LevelIterator
Iter constructs a LevelIterator that iterates over the slice.
func (*LevelSlice) Len ¶
func (ls *LevelSlice) Len() int
Len returns the number of files in the slice. Its runtime is constant.
func (*LevelSlice) NumVirtual ¶ added in v1.1.0
func (ls *LevelSlice) NumVirtual() uint64
NumVirtual returns the number of virtual sstables in the level. Its runtime is linear in the length of the slice.
func (LevelSlice) Reslice ¶
func (ls LevelSlice) Reslice(resliceFunc func(start, end *LevelIterator)) LevelSlice
Reslice constructs a new slice backed by the same underlying level, with new start and end positions. Reslice invokes the provided function, passing two LevelIterators: one positioned to i's inclusive start and one positioned to i's inclusive end. The resliceFunc may move either iterator forward or backwards, including beyond the callee's original bounds to capture additional files from the underlying level. Reslice constructs and returns a new LevelSlice with the final bounds of the iterators after calling resliceFunc.
func (*LevelSlice) SizeSum ¶
func (ls *LevelSlice) SizeSum() uint64
SizeSum sums the size of all files in the slice. Its runtime is linear in the length of the slice.
func (*LevelSlice) VirtualSizeSum ¶ added in v1.1.0
func (ls *LevelSlice) VirtualSizeSum() uint64
VirtualSizeSum returns the sum of the sizes of the virtual sstables in the level.
type NewFileEntry ¶
type NewFileEntry struct { Level int Meta *FileMetadata // BackingFileNum is only set during manifest replay, and only for virtual // sstables. BackingFileNum base.DiskFileNum }
NewFileEntry holds the state for a new file or one moved from a different level.
type OrderingInvariants ¶ added in v1.1.0
type OrderingInvariants int8
OrderingInvariants dictates the file ordering invariants active.
const ( // ProhibitSplitUserKeys indicates that adjacent files within a level cannot // contain the same user key. ProhibitSplitUserKeys OrderingInvariants = iota // AllowSplitUserKeys indicates that adjacent files within a level may // contain the same user key. This is only allowed by historical format // major versions. // // TODO(jackson): Remove. AllowSplitUserKeys )
type PhysicalFileMeta ¶ added in v1.1.0
type PhysicalFileMeta struct {
*FileMetadata
}
PhysicalFileMeta is used by functions which want a guarantee that their input belongs to a physical sst and not a virtual sst.
NB: This type should only be constructed by calling FileMetadata.PhysicalMeta.
type TableInfo ¶
type TableInfo struct { // FileNum is the internal DB identifier for the table. FileNum base.FileNum // Size is the size of the file in bytes. Size uint64 // Smallest is the smallest internal key in the table. Smallest InternalKey // Largest is the largest internal key in the table. Largest InternalKey // SmallestSeqNum is the smallest sequence number in the table. SmallestSeqNum uint64 // LargestSeqNum is the largest sequence number in the table. LargestSeqNum uint64 }
TableInfo contains the common information for table related events.
type TableStats ¶
type TableStats struct { // The total number of entries in the table. NumEntries uint64 // The number of point and range deletion entries in the table. NumDeletions uint64 // NumRangeKeySets is the total number of range key sets in the table. // // NB: If there's a chance that the sstable contains any range key sets, // then NumRangeKeySets must be > 0. NumRangeKeySets uint64 // Estimate of the total disk space that may be dropped by this table's // point deletions by compacting them. PointDeletionsBytesEstimate uint64 // Estimate of the total disk space that may be dropped by this table's // range deletions by compacting them. This estimate is at data-block // granularity and is not updated if compactions beneath the table reduce // the amount of reclaimable disk space. It also does not account for // overlapping data in L0 and ignores L0 sublevels, but the error that // introduces is expected to be small. // // Tables in the bottommost level of the LSM may have a nonzero estimate if // snapshots or move compactions prevented the elision of their range // tombstones. A table in the bottommost level that was ingested into L6 // will have a zero estimate, because the file's sequence numbers indicate // that the tombstone cannot drop any data contained within the file itself. RangeDeletionsBytesEstimate uint64 // Total size of value blocks and value index block. ValueBlocksSize uint64 }
TableStats contains statistics on a table used for compaction heuristics, and export via Metrics.
type UserKeyRange ¶
type UserKeyRange struct {
Start, End []byte
}
UserKeyRange encodes a key range in user key space. A UserKeyRange's Start and End boundaries are both inclusive.
type Version ¶
type Version struct { // The level 0 sstables are organized in a series of sublevels. Similar to // the seqnum invariant in normal levels, there is no internal key in a // higher level table that has both the same user key and a higher sequence // number. Within a sublevel, tables are sorted by their internal key range // and any two tables at the same sublevel do not overlap. Unlike the normal // levels, sublevel n contains older tables (lower sequence numbers) than // sublevel n+1. // // The L0Sublevels struct is mostly used for compaction picking. As most // internal data structures in it are only necessary for compaction picking // and not for iterator creation, the reference to L0Sublevels is nil'd // after this version becomes the non-newest version, to reduce memory // usage. // // L0Sublevels.Levels contains L0 files ordered by sublevels. All the files // in Levels[0] are in L0Sublevels.Levels. L0SublevelFiles is also set to // a reference to that slice, as that slice is necessary for iterator // creation and needs to outlast L0Sublevels. L0Sublevels *L0Sublevels L0SublevelFiles []LevelSlice Levels [NumLevels]LevelMetadata // RangeKeyLevels holds a subset of the same files as Levels that contain range // keys (i.e. fileMeta.HasRangeKeys == true). The memory amplification of this // duplication should be minimal, as range keys are expected to be rare. RangeKeyLevels [NumLevels]LevelMetadata // The callback to invoke when the last reference to a version is // removed. Will be called with list.mu held. Deleted func(obsolete []*FileBacking) // Stats holds aggregated stats about the version maintained from // version to version. Stats struct { // MarkedForCompaction records the count of files marked for // compaction within the version. MarkedForCompaction int } // contains filtered or unexported fields }
Version is a collection of file metadata for on-disk tables at various levels. In-memory DBs are written to level-0 tables, and compactions migrate data from level N to level N+1. The tables map internal keys (which are a user key, a delete or set bit, and a sequence number) to user values.
The tables at level 0 are sorted by largest sequence number. Due to file ingestion, there may be overlap in the ranges of sequence numbers contain in level 0 sstables. In particular, it is valid for one level 0 sstable to have the seqnum range [1,100] while an adjacent sstable has the seqnum range [50,50]. This occurs when the [50,50] table was ingested and given a global seqnum. The ingestion code will have ensured that the [50,50] sstable will not have any keys that overlap with the [1,100] in the seqnum range [1,49]. The range of internal keys [fileMetadata.smallest, fileMetadata.largest] in each level 0 table may overlap.
The tables at any non-0 level are sorted by their internal key range and any two tables at the same non-0 level do not overlap.
The internal key ranges of two tables at different levels X and Y may overlap, for any X != Y.
Finally, for every internal key in a table at level X, there is no internal key in a higher level table that has both the same user key and a higher sequence number.
func AccumulateIncompleteAndApplySingleVE ¶ added in v1.1.0
func AccumulateIncompleteAndApplySingleVE( ve *VersionEdit, curr *Version, cmp Compare, formatKey base.FormatKey, flushSplitBytes int64, readCompactionRate int64, backingStateMap map[base.DiskFileNum]*FileBacking, addBackingFunc func(*FileBacking), removeBackingFunc func(base.DiskFileNum), orderingInvariants OrderingInvariants, ) (_ *Version, zombies map[base.DiskFileNum]uint64, _ error)
AccumulateIncompleteAndApplySingleVE should be called if a single version edit is to be applied to the provided curr Version and if the caller needs to update the versionSet.zombieTables map. This function exists separately from BulkVersionEdit.Apply because it is easier to reason about properties regarding BulkVersionedit.Accumulate/Apply and zombie table generation, if we know that exactly one version edit is being accumulated.
Note that the version edit passed into this function may be incomplete because compactions don't have the ref counting information necessary to populate VersionEdit.RemovedBackingTables. This function will complete such a version edit by populating RemovedBackingTables.
Invariant: Any file being deleted through ve must belong to the curr Version. We can't have a delete for some arbitrary file which does not exist in curr.
func NewVersion ¶
func NewVersion( cmp Compare, formatKey base.FormatKey, flushSplitBytes int64, files [NumLevels][]*FileMetadata, ) *Version
NewVersion constructs a new Version with the provided files. It requires the provided files are already well-ordered. It's intended for testing.
func ParseVersionDebug ¶
func ParseVersionDebug( cmp Compare, formatKey base.FormatKey, flushSplitBytes int64, s string, ) (*Version, error)
ParseVersionDebug parses a Version from its DebugString output.
func (*Version) CheckOrdering ¶
func (v *Version) CheckOrdering( cmp Compare, format base.FormatKey, order OrderingInvariants, ) error
CheckOrdering checks that the files are consistent with respect to increasing file numbers (for level 0 files) and increasing and non- overlapping internal key ranges (for level non-0 files).
func (*Version) Contains ¶
func (v *Version) Contains(level int, cmp Compare, m *FileMetadata) bool
Contains returns a boolean indicating whether the provided file exists in the version at the given level. If level is non-zero then Contains binary searches among the files. If level is zero, Contains scans the entire level.
func (*Version) DebugString ¶
DebugString returns an alternative format to String() which includes sequence number and kind information for the sstable boundaries.
func (*Version) InitL0Sublevels ¶
func (v *Version) InitL0Sublevels( cmp Compare, formatKey base.FormatKey, flushSplitBytes int64, ) error
InitL0Sublevels initializes the L0Sublevels
func (*Version) Overlaps ¶
func (v *Version) Overlaps( level int, cmp Compare, start, end []byte, exclusiveEnd bool, ) LevelSlice
Overlaps returns all elements of v.files[level] whose user key range intersects the given range. If level is non-zero then the user key ranges of v.files[level] are assumed to not overlap (although they may touch). If level is zero then that assumption cannot be made, and the [start, end] range is expanded to the union of those matching ranges so far and the computation is repeated until [start, end] stabilizes. The returned files are a subsequence of the input files, i.e., the ordering is not changed.
func (*Version) String ¶
String implements fmt.Stringer, printing the FileMetadata for each level in the Version.
func (*Version) Unref ¶
func (v *Version) Unref()
Unref decrements the version refcount. If the last reference to the version was removed, the version is removed from the list of versions and the Deleted callback is invoked. Requires that the VersionList mutex is NOT locked.
func (*Version) UnrefLocked ¶
func (v *Version) UnrefLocked()
UnrefLocked decrements the version refcount. If the last reference to the version was removed, the version is removed from the list of versions and the Deleted callback is invoked. Requires that the VersionList mutex is already locked.
type VersionEdit ¶
type VersionEdit struct { // ComparerName is the value of Options.Comparer.Name. This is only set in // the first VersionEdit in a manifest (either when the DB is created, or // when a new manifest is created) and is used to verify that the comparer // specified at Open matches the comparer that was previously used. ComparerName string // MinUnflushedLogNum is the smallest WAL log file number corresponding to // mutations that have not been flushed to an sstable. // // This is an optional field, and 0 represents it is not set. MinUnflushedLogNum base.FileNum // ObsoletePrevLogNum is a historic artifact from LevelDB that is not used by // Pebble, RocksDB, or even LevelDB. Its use in LevelDB was deprecated in // 6/2011. We keep it around purely for informational purposes when // displaying MANIFEST contents. ObsoletePrevLogNum uint64 // The next file number. A single counter is used to assign file numbers // for the WAL, MANIFEST, sstable, and OPTIONS files. NextFileNum base.FileNum // LastSeqNum is an upper bound on the sequence numbers that have been // assigned in flushed WALs. Unflushed WALs (that will be replayed during // recovery) may contain sequence numbers greater than this value. LastSeqNum uint64 // A file num may be present in both deleted files and new files when it // is moved from a lower level to a higher level (when the compaction // found that there was no overlapping file at the higher level). DeletedFiles map[DeletedFileEntry]*FileMetadata NewFiles []NewFileEntry // CreatedBackingTables can be used to preserve the FileBacking associated // with a physical sstable. This is useful when virtual sstables in the // latest version are reconstructed during manifest replay, and we also need // to reconstruct the FileBacking which is required by these virtual // sstables. // // INVARIANT: The FileBacking associated with a physical sstable must only // be added as a backing file in the same version edit where the physical // sstable is first virtualized. This means that the physical sstable must // be present in DeletedFiles and that there must be at least one virtual // sstable with the same FileBacking as the physical sstable in NewFiles. A // file must be present in CreatedBackingTables in exactly one version edit. // The physical sstable associated with the FileBacking must also not be // present in NewFiles. CreatedBackingTables []*FileBacking // RemovedBackingTables is used to remove the FileBacking associated with a // virtual sstable. Note that a backing sstable can be removed as soon as // there are no virtual sstables in the latest version which are using the // backing sstable, but the backing sstable doesn't necessarily have to be // removed atomically with the version edit which removes the last virtual // sstable associated with the backing sstable. The removal can happen in a // future version edit. // // INVARIANT: A file must only be added to RemovedBackingTables if it was // added to CreateBackingTables in a prior version edit. The same version // edit also cannot have the same file present in both CreateBackingTables // and RemovedBackingTables. A file must be present in RemovedBackingTables // in exactly one version edit. RemovedBackingTables []base.DiskFileNum }
VersionEdit holds the state for an edit to a Version along with other on-disk state (log numbers, next file number, and the last sequence number).
func (*VersionEdit) DebugString ¶ added in v1.1.0
func (v *VersionEdit) DebugString(fmtKey base.FormatKey) string
DebugString is a more verbose version of String(). Use this in tests.
func (*VersionEdit) Decode ¶
func (v *VersionEdit) Decode(r io.Reader) error
Decode decodes an edit from the specified reader.
Note that the Decode step will not set the FileBacking for virtual sstables and the responsibility is left to the caller. However, the Decode step will populate the NewFileEntry.BackingFileNum in VersionEdit.NewFiles.
func (*VersionEdit) Encode ¶
func (v *VersionEdit) Encode(w io.Writer) error
Encode encodes an edit to the specified writer.
func (*VersionEdit) String ¶
func (v *VersionEdit) String() string
String implements fmt.Stringer for a VersionEdit.
type VersionList ¶
type VersionList struct {
// contains filtered or unexported fields
}
VersionList holds a list of versions. The versions are ordered from oldest to newest.
func (*VersionList) Back ¶
func (l *VersionList) Back() *Version
Back returns the newest version in the list. Note that this version is only valid if Empty() returns true.
func (*VersionList) Empty ¶
func (l *VersionList) Empty() bool
Empty returns true if the list is empty, and false otherwise.
func (*VersionList) Front ¶
func (l *VersionList) Front() *Version
Front returns the oldest version in the list. Note that this version is only valid if Empty() returns true.
func (*VersionList) Init ¶
func (l *VersionList) Init(mu *sync.Mutex)
Init initializes the version list.
func (*VersionList) PushBack ¶
func (l *VersionList) PushBack(v *Version)
PushBack adds a new version to the back of the list. This new version becomes the "newest" version in the list.
func (*VersionList) Remove ¶
func (l *VersionList) Remove(v *Version)
Remove removes the specified version from the list.
type VirtualFileMeta ¶ added in v1.1.0
type VirtualFileMeta struct {
*FileMetadata
}
VirtualFileMeta is used by functions which want a guarantee that their input belongs to a virtual sst and not a physical sst.
A VirtualFileMeta inherits all the same fields as a FileMetadata. These fields have additional invariants imposed on them, and/or slightly varying meanings:
- Smallest and Largest (and their counterparts {Smallest, Largest}{Point,Range}Key) remain tight bounds that represent a key at that exact bound. We make the effort to determine the next smallest or largest key in an sstable after virtualizing it, to maintain this tightness. If the largest is a sentinel key (IsExclusiveSentinel()), it could mean that a rangedel or range key ends at that user key, or has been truncated to that user key.
- One invariant is that if a rangedel or range key is truncated on its upper bound, the virtual sstable *must* have a rangedel or range key sentinel key as its upper bound. This is because truncation yields an exclusive upper bound for the rangedel/rangekey, and if there are any points at that exclusive upper bound within the same virtual sstable, those could get uncovered by this truncation. We enforce this invariant in calls to keyspan.Truncate.
- Size is an estimate of the size of the virtualized portion of this sstable. The underlying file's size is stored in FileBacking.Size, though it could also be estimated or could correspond to just the referenced portion of a file (eg. if the file originated on another node).
- Size must be > 0.
- SmallestSeqNum and LargestSeqNum are loose bounds for virtual sstables. This means that all keys in the virtual sstable must have seqnums within [SmallestSeqNum, LargestSeqNum], however there's no guarantee that there's a key with a seqnum at either of the bounds. Calculating tight seqnum bounds would be too expensive and deliver little value.
NB: This type should only be constructed by calling FileMetadata.VirtualMeta.