jdb

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 22, 2024 License: MIT Imports: 15 Imported by: 0

README

jdb - the naive timeseries database

jdb is an embeddable Schemaless Timeseries Database, queried in-memory, and with on-disc persistence.

It is deliberately naive and is designed to be 'good-enough'. It wont solve all of your woes, it wont handle petabytes of scale, and it wont make your applications more enterprisey.

It will, however, give you a reasonably quick way of storing timeseries, querying against an index or time range, and provide de-duplication guarantees.

In essence, jdb stores measurements in a series of nested maps, with extra maps acting as indices, and protgates writes with a mutex. All writes use the same mutex, making it decent and simple enough for relatively few measurement types, or where writes don't need to complete in jig time. And, in fact, that's kind of the point; jdb is designed to be aggresively immediately consistent and simple for me to maintain.

If the above sucks for your usecase then that's cool, too- just because I've written this doesn't mean you have to use this.

Measurements

A measurement looks like:

type Measurement struct {
    When       time.Time          `json:"when"`
    Name       string             `json:"name"`
    Dimensions map[string]float64 `json:"dimensions"`
    Labels     map[string]string  `json:"labels"`
    Indices    map[string]string  `json:"indices"`
}

In this struct, the following fields have the following meaning:

  • When: A time.Time representing when this measurement should be plotted against; you can do what you want. It's used to sort ingested data, meaning that writes can occur in any order.
  • Name: We use Name to group measurements together. You could easily compare this with a database in another world
  • Dimensions: The actual, numerical, things being measured. These are stored as float64s, but a float is easily coerced to/from more or less any numeric type, so you do you babe
  • Labels: Optional metadata for a measurement. These aren't searchable or orderable and so only really cost whatever space they take up
  • Indices: An index can be used to lookup measurements matching specific criteria and, thus, take up more space in memory for that to happen. Think about cardinality when sussing out what Indices and what Labels to uuse

An example, in JSON, of a measurement from one of my own environmental sensors is:

{
    "when": "2024-11-22T11:46:44.599303882Z",
    "name": "environment",
    "dimensions": {
      "aqi": 3,
      "co2": 806,
      "humidity": 34.83123779296875,
      "temperature": 19.743728637695312,
      "tvoc": 315
    },
    "labels": {
      "device_id": "RP2040",
      "internal_temperature": "28074",
      "uptime": "74482980"
    },
    "indices": {
      "device": "kitchen"
    }
}

(Side note: better go and open a kitchen window, or make sure I haven't left my lunch cooking)

Querying

jdb will either return all matching data, or allows for time slicing with the optional argument *jdb.Options:

type Options struct {
        // From defines the earliest timestamp to return Measurements
        // for. It is inclusive, which is to say that if the time is set
        // to `14:45:00 30th April 2024` and there is a record with that
        // precise timestamp, then that record will be included.
        //
        // This field is ignored if `Since` is set. If this field is unset
        // and To is set then From implies "All data from the start of time"
        From time.Time `json:"from" form:"from"`

        // To defines the latest timestamp to return Measurements for.
        // Similarly to From, if this field is empty and From is set, then
        // the implication is "All records from `From` to the end".
        //
        // If both this field and Since are set, then JDB returns the last
        // `Since` duration _to_ To
        To time.Time `json:"to" form:"to"`

        // Since returns Measurements created within the Duration covered by
        // this field. If `To` is unset, then Since returns up until the
        // current time
        Since time.Duration `json:"since" form:"since"`
}

jdb provides two major interfaces for querying data:

QueryAll(name string, opts *jdb.Options)

Return measurements for a given name (so, in the example above, environment), optionally using the time slice.

QueryAllIndex(name, index, indexValue string, opts *jdb.Options)

Returns measurements for a given name, and where a specific index value matches. For the above json example, you might query QueryAllIndex("environment", "device", "kitchen", nil) to grab every measurement from the kitchen device.

Documentation

Index

Examples

Constants

View Source
const (

	// DefaultIndexName is used for Measurements where an Index
	// hasn't beed specified so we can still de-dupe it.
	DefaultIndexName = "_default_index"
)

Variables

View Source
var (
	// Logger can be used to log database internal operations for various
	// info statements, or left as the default- which wont log anything
	Logger = slog.New(slog.NewTextHandler(io.Discard, nil))

	// If the save buffer hits `FlushMaxSize` length then
	// flush to disk
	FlushMaxSize = 1_000

	// If the save buffer hasn't been flushed for `FlushMaxDuration` or
	// longer then flush to disk
	FlushMaxDuration = time.Hour

	// ErrNoSuchMeasurement returns when trying to retrieve a Measurement
	// that hasn't been indexed by this JDB instance
	ErrNoSuchMeasurement = errors.New("unknown measurement name")

	// ErrNoSuchIndex returns for calls to QueryAllIndex where the index in
	// question does not exist for the specified Measurement
	ErrNoSuchIndex = errors.New("unknown index")

	// ErrDuplicateMeasurement returns when trying to Insert a Measurement, where
	// there is already a Measurement with the same derived ID
	//
	// These IDs are derived in such a way that they have a Nanosecond precision
	// against a particular measurement + index name + index value and so receiving
	// this error is a problem, and may point toward reusing/ not correctly
	// setting the value of Measurement.When
	ErrDuplicateMeasurement = errors.New("measurement and index combination exist for this timestamp")
)
View Source
var (
	ErrEmptyName    = errors.New("measurement name must not be empty")
	ErrNoDimensions = errors.New("measurement has no dimensions")
	ErrFieldInUse   = errors.New("field names must be unique across dimensions, labels, and indices for a given Measurement name")
)

Functions

This section is empty.

Types

type JDB

type JDB struct {
	// contains filtered or unexported fields
}

JDB is an embeddable Schemaless Timeseries Database, queried in-memory, and with on-disc persistence.

It is deliberately naive and is designed to be 'good-enough'. It wont solve all of your woes, it wont handle petabytes of scale, and it wont make your applications more enterprisey.

It will, however, give you a reasonably quick way of storing timeseries, querying against an index or time range, and provide de-duplication gaurantees.

func New

func New(file string) (j *JDB, err error)

New returns a JDB from a databse file on disk, creating the database file if it doesn't already exist.

New returns errors in the following contexts:

  1. Where the OS can't open a database file for writing
  2. The file it has opened isn't valid for JDB

This function outputs optional logs, which can be enabled by setting `jdb.Logger` to a valid `slog.Logger`

Example (Create_close_reopen_database)
package main

import (
	"fmt"
	"os"

	"github.com/jspc/jdb"
)

func main() {
	f, err := os.CreateTemp("", "")
	if err != nil {
		panic(err)
	}
	f.Close()

	// Effectively disable flushing to disk for the sake of
	// timeliness in this test
	jdb.FlushMaxSize = 1_000_000
	jdb.FlushMaxDuration = 1<<63 - 1

	database, err := jdb.New(f.Name())
	if err != nil {
		panic(err)
	}

	err = database.Insert(&jdb.Measurement{Name: "counters", Dimensions: map[string]float64{"Counter": 1234}})
	if err != nil {
		panic(err)
	}

	// Query database
	m, err := database.QueryAll("counters", nil)
	if err != nil {
		panic(err)
	}

	fmt.Printf("counters: %d\n", len(m))

	// Close database
	database.Close()

	// Reopen, reconcile for same data
	database, err = jdb.New(f.Name())
	if err != nil {
		panic(err)
	}

	m, err = database.QueryAll("counters", nil)
	if err != nil {
		panic(err)
	}

	fmt.Printf("counters: %d\n", len(m))

}
Output:

counters: 1
counters: 1
Example (Create_database_and_query_index)
package main

import (
	"fmt"
	"os"
	"time"

	"github.com/jspc/jdb"
)

func main() {
	f, err := os.CreateTemp("", "")
	if err != nil {
		panic(err)
	}
	f.Close()

	// Effectively disable flushing to disk for the sake of
	// timeliness in this test
	jdb.FlushMaxSize = 1_000_000
	jdb.FlushMaxDuration = 1<<63 - 1

	database, err := jdb.New(f.Name())
	if err != nil {
		panic(err)
	}

	defer database.Close()

	t := time.Time{}
	for i := 0; i < 1000; i++ {
		t = t.Add(time.Minute)

		m := &jdb.Measurement{
			When: t,
			Name: "environmental_monitoring",
			Dimensions: map[string]float64{
				"Temperature": 19.23,
				"Humidity":    52.43234,
				"AQI":         1,
			},
			Labels: map[string]string{
				"sensor_version": "v1.0.1",
				"uptime":         "1h31m6s",
			},
			Indices: map[string]string{
				"location": "living room",
			},
		}

		err = m.Validate()
		if err != nil {
			panic(err)
		}

		err = database.Insert(m)
		if err != nil {
			panic(err)
		}
	}

	// Query an empty index
	measurements, err := database.QueryAllIndex("environmental_monitoring", "location", "bedroom", nil)
	if err != nil {
		panic(err)
	}

	fmt.Printf("measurements where location == bedroom: %d\n", len(measurements))

	// Query an index with items
	measurements, err = database.QueryAllIndex("environmental_monitoring", "location", "living room", new(jdb.Options))
	if err != nil {
		panic(err)
	}

	fmt.Printf("measurements where location == 'living room': %d\n", len(measurements))

}
Output:

measurements where location == bedroom: 0
measurements where location == 'living room': 1000

func (*JDB) Close

func (j *JDB) Close() (err error)

Close a JDB, flushing contents to disk

func (*JDB) Insert

func (j *JDB) Insert(m *Measurement) (err error)

Insert a Measurement into the database.

Insert does this by performing a handful of tasks:

  1. Insert will call m.Validate() to ensure the data is correct
  2. Check whether we've already received this Measurement, erroring if so
  3. Adding the Measurement to the underlying data structure(s)
  4. Updating Measurement metadata (field names, indices, etc.)
  5. Persisting to disk if the write buffer is full, or it's been some time since the last write

Because we're using slices and maps under the hood without intermediate buffers, this call relies on mutexes that may be slow at times.

The upshot of this is that calls to Insert are immediately consistent.

func (*JDB) QueryAll

func (j *JDB) QueryAll(name string, opts *Options) (m []*Measurement, err error)

QueryAll queries for a Measurement name, returning all Measurements that fit.

When opts is not nil, the specified time slicing options are used to return a subset of Measurements.

For the purposes of time slicing, setting opts to nil has identical behaviour to setting it to empty, such as `&jdb.Options{}`, or `new(jdb.Options)`- though setting opts as nil saves a chunk of cycles and is, therefore, marginallty more efficient

func (*JDB) QueryAllCSV

func (j *JDB) QueryAllCSV(name string, opts *Options) (b []byte, err error)

QueryAllCSV works identically to `QueryAll` (in fact it calls `QueryAll` under the hood), but returns Measurements as a []byte representation of the generated CSV.

It can be quite expensive for large datasets.

This function can be used to load data into other sources, such as jupyter, or a spreadsheet.

When opts is not nil, the specified time slicing options are used to return a subset of Measurements.

For the purposes of time slicing, setting opts to nil has identical behaviour to setting it to empty, such as `&jdb.Options{}`, or `new(jdb.Options)`- though setting opts as nil saves a chunk of cycles and is, therefore, marginallty more efficient

func (*JDB) QueryAllIndex

func (j *JDB) QueryAllIndex(name, index, indexValue string, opts *Options) (m []*Measurement, err error)

QueryAllIndex queries for a Measurement name, returning all Measurements with a specific Index value.

When opts is not nil, the specified time slicing options are used to return a subset of Measurements.

For the purposes of time slicing, setting opts to nil has identical behaviour to setting it to empty, such as `&jdb.Options{}`, or `new(jdb.Options)`- though setting opts as nil saves a chunk of cycles and is, therefore, marginallty more efficient

func (*JDB) QueryFields

func (j *JDB) QueryFields(measurement string) (fields []string, err error)

QueryFields returns the fields set for a Measurement

type Measurement

type Measurement struct {
	When       time.Time          `json:"when"`
	Name       string             `json:"name"`
	Dimensions map[string]float64 `json:"dimensions"`
	Labels     map[string]string  `json:"labels"`
	Indices    map[string]string  `json:"indices"`
}

A Measurement represents a collection of values and metadata to store against a timestamp.

It contains a timestamp, measurement name, some dimensions, some indices, and some labels.

In our world, a Measurement Name might be analogous to a database name. A Measurement has one or more numerical Dimensions, some labels and some indices.

The only differences between a label and an index is that an index is searchable and a label isn't. Because of this, an index takes up more memory space and so isn't always appropriate. If you're never going to need to search for a given string then it's best off using a label for the sake of resources and speed.

Internally, Measurements are deduplicated by deriving a Measurement ID of the format:

id := name + \0x00 + indexName + \0x00 + indexValue + \0x00 + measurement_timestamp_in_nanoseconds + \0x00

and then base64 encoded.

This does mean there's the potential for collisions, should multiple Measurements have the same name, index, and timestamp (to the nanosecond); it's _unlikely_ to happen, but it's possible. With this in mind, indexing on a sensor ID, or something unique to the creator of a Measurement is always smart

func (*Measurement) Validate

func (m *Measurement) Validate() error

Validate returns an error if:

  1. The Measurement name is empty
  2. The Measurement has no Dimensions

If the Measurement has no indices, we create one called `_default_index` with the same value as the Measurement name. This exists purely to make deduplication easier and can be ignored by pretty much everything

Without these three elements, a Measurement is functionally meaningless

type Options added in v0.2.0

type Options struct {
	// From defines the earliest timestamp to return Measurements
	// for. It is inclusive, which is to say that if the time is set
	// to `14:45:00 30th April 2024` and there is a record with that
	// precise timestamp, then that record will be included.
	//
	// This field is ignored if `Since` is set. If this field is unset
	// and To is set then From implies "All data from the start of time"
	From time.Time `json:"from" form:"from"`

	// To defines the latest timestamp to return Measurements for.
	// Similarly to From, if this field is empty and From is set, then
	// the implication is "All records from `From` to the end".
	//
	// If both this field and Since are set, then JDB returns the last
	// `Since` duration _to_ To
	To time.Time `json:"to" form:"to"`

	// Since returns Measurements created within the Duration covered by
	// this field. If `To` is unset, then Since returns up until the
	// current time
	Since time.Duration `json:"since" form:"since"`
}

Options can be passed to Query* functions on Database and allow for slicing Measurements based on timestamps according to the rules below

Directories

Path Synopsis
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL