badger

package
v1.63.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 10, 2024 License: Apache-2.0 Imports: 24 Imported by: 10

README

Badger data storage

Data modeling

The key design in badger storage backend takes advantage of sorted nature of badger as well as the key only searching with badger, which does not require loading the values from the value log but only the keys in the LSM tree. This is used to implement efficient inverted index for both lookups as well as range searches. All the values in the keys must be stored in big endian ordering to make the sorting work properly.

Index keys structure is created in the createIndexKey in spanstore/writer.go and the primary key for spans in createTraceKV.

Primary key design

Primary keys are the only keys that have a value in the badger's storage. Each key presents a single span, thus a single trace is a collection of tuples. The value is the actual span, which is marshalled into bytes. The marshalling format is indicated by the last 4 bits of the meta encoding byte in the badger entry.

Primary keys are sorted as follows:

  • TraceID High
  • TraceID Low
  • Timestamp
  • SpanID

This allows quick lookup for a single TraceID by searching for prefix with: 0x80 + traceID high + traceID low and then iterating as long as that prefix is valid. Note that timestamp ordering does not allow fetching a range of traces in a time range.

Index key design

Each index key has a single byte first to indicate which field is indexed. The last 4 bits of the first byte in the key are used to indicate which index key is used, with the first 4 bits being zeroed. This sorts the LSM tree by index field which allows quicker range queries. Each inverted index key is then sorted in the following order:

  • Value
  • Timestamp
  • TraceID High
  • TraceID Low

That means the scanning for a single value can continue until we reach the first timestamp which is not in the boundaries and then stop since we can guarantee the future keys are not going to be valid.

Index searches

If the lookup is a single traceID, the logic mentioned in the Primary key design section is used. If instead we have a TraceQueryParameters with one or more search keys to use, we need to combine the results of multiple index seeks to form an intersection of those results. Each search parameter (each tag is new search parameter) is used to scan single index key, thus we iterate the index until the <indexKey><value><timestamp> is no longer valid. We do this by checking the prefix for <indexKey><value> for exactness and then <timestamp> for range. As long as that one is valid, we fetch the keys. Once the timestamp goes beyond our maximum timestamp, the iteration stops. The keys are then sorted to TraceID order instead of their natural key ordering for the next part.

Exception to the above is the duration index, since there are no exact duration values but a range of values. When scanning it, the prefix search lookups the starting point with <indexKey><minDurationValue> and scans the index until <indexKey><maxDurationValue> is reached. Each key is then separately checked for valid <timestamp> but the timestamp does not control the seek process and some keys are ignored because they did not match the given time range.

Because each TraceID is stored as spans, the same TraceID can appear multiple times from a index query. Other than duration query, this means they are coming in order so each of them is discarded by easily checking if the previous one is equal to current one, but with the duration index the spans can come in random order and thus hash-join is used to filter the duplicates.

After all the index keys have been scanned, the process is then sent to the merge-join where two index queries are compared and only matching IDs are taken. After that, the next one is compared to the result of the previous and so forth until all the index fetches have been processed. The resulting query set is the list of TraceIDs that matched all the requirements.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config added in v1.61.0

type Config struct {
	// TTL holds time-to-live configuration for the badger store.
	TTL TTL `mapstructure:"ttl"`
	// Directories contains the configuration for where items are stored. Ephemeral must be
	// set to false for this configuration to take effect.
	Directories Directories `mapstructure:"directories"`
	// Ephemeral, if set to true, will store data in a temporary file system.
	// If set to true, the configuration in Directories is ignored.
	Ephemeral bool `mapstructure:"ephemeral"`
	// SyncWrites, if set to true, will immediately sync all writes to disk. Note that
	// setting this field to true will affect write performance.
	SyncWrites bool `mapstructure:"consistency"`
	// MaintenanceInterval is the regular interval after which a maintenance job is
	// run on the values in the store.
	MaintenanceInterval time.Duration `mapstructure:"maintenance_interval"`
	// MetricsUpdateInterval is the regular interval after which metrics are collected
	// by Jaeger.
	MetricsUpdateInterval time.Duration `mapstructure:"metrics_update_interval"`
	// ReadOnly opens the data store in read-only mode. Multiple instances can open the same
	// store in read-only mode. Values still in the write-ahead-log must be replayed before opening.
	ReadOnly bool `mapstructure:"read_only"`
}

Config is badger's internal configuration data.

func DefaultConfig added in v1.61.0

func DefaultConfig() *Config

func (*Config) AddFlags added in v1.61.0

func (c *Config) AddFlags(flagSet *flag.FlagSet)

AddFlags adds flags for Config.

func (*Config) InitFromViper added in v1.61.0

func (c *Config) InitFromViper(v *viper.Viper, logger *zap.Logger)

InitFromViper initializes Config with properties from viper.

func (*Config) Validate added in v1.61.0

func (c *Config) Validate() error

type Directories added in v1.61.0

type Directories struct {
	// Keys contains the directory in which the keys are stored.
	Keys string `mapstructure:"keys"`
	// Values contains the directory in which the values are stored.
	Values string `mapstructure:"values"`
}

type Factory

type Factory struct {
	Config *Config
	// contains filtered or unexported fields
}

Factory implements storage.Factory for Badger backend.

func NewFactory

func NewFactory() *Factory

NewFactory creates a new Factory.

func NewFactoryWithConfig added in v1.54.0

func NewFactoryWithConfig(
	cfg Config,
	metricsFactory metrics.Factory,
	logger *zap.Logger,
) (*Factory, error)

func (*Factory) AddFlags

func (f *Factory) AddFlags(flagSet *flag.FlagSet)

AddFlags implements plugin.Configurable

func (*Factory) Close

func (f *Factory) Close() error

Close Implements io.Closer and closes the underlying storage

func (*Factory) CreateDependencyReader

func (f *Factory) CreateDependencyReader() (dependencystore.Reader, error)

CreateDependencyReader implements storage.Factory

func (*Factory) CreateLock added in v1.52.0

func (*Factory) CreateLock() (distributedlock.Lock, error)

CreateLock implements storage.SamplingStoreFactory

func (*Factory) CreateSamplingStore added in v1.51.0

func (f *Factory) CreateSamplingStore(int) (samplingstore.Store, error)

CreateSamplingStore implements storage.SamplingStoreFactory

func (*Factory) CreateSpanReader

func (f *Factory) CreateSpanReader() (spanstore.Reader, error)

CreateSpanReader implements storage.Factory

func (*Factory) CreateSpanWriter

func (f *Factory) CreateSpanWriter() (spanstore.Writer, error)

CreateSpanWriter implements storage.Factory

func (*Factory) InitFromViper

func (f *Factory) InitFromViper(v *viper.Viper, logger *zap.Logger)

InitFromViper implements plugin.Configurable

func (*Factory) Initialize

func (f *Factory) Initialize(metricsFactory metrics.Factory, logger *zap.Logger) error

Initialize implements storage.Factory

func (*Factory) Purge added in v1.57.0

func (f *Factory) Purge(_ context.Context) error

Purge removes all data from the Factory's underlying Badger store. This function is intended for testing purposes only and should not be used in production environments. Calling Purge in production will result in permanent data loss.

type TTL added in v1.61.0

type TTL struct {
	// SpanStore holds the amount of time that the span store data is stored.
	// Once this duration has passed for a given key, span store data will
	// no longer be accessible.
	Spans time.Duration `mapstructure:"spans"`
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL