ts

package
v0.0.0-...-005594a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 17, 2017 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package ts provides a basic time series datastore on top of the underlying CockroachDB key/value datastore. It is used to serve basic metrics generated by CockroachDB.

Storing time series data is a unique challenge for databases. Time series data is typically generated at an extremely high volume, and is queried by providing a range of time of arbitrary size, which can lead to an enormous amount of data being scanned for a query. Many specialized time series databases already exist to meet these challenges; those solutions are built on top of specialized storage engines which are often unsuitable for general data storage needs, but currently superior to CockroachDB for the purpose of time series data.

However, it is a broad goal of CockroachDB to provide a good experience for developers, and out-of-the-box recording of internal metrics is helpful for small and prototype deployments.

Organization Structure

Time series data is organized on disk according to two basic, sortable properties: + Time series name (i.e "sql.operations.selects") + Timestamp

This is optimized for querying data for a single series over multiple timestamps: data for the same series at different timestamps is stored contiguously.

Downsampling

The amount of data produced by time series sampling can be considerable; storing every incoming data point with perfect fidelity can command a tremendous amount of computing and storage resources.

However, in many use cases perfect fidelity is not necessary; the exact time a sample was taken is unimportant, with the overall trend of the data over time being far more important to analysis than the individual samples.

With this in mind, CockroachDB downsamples data before storing it; the original timestamp for each data point in a series is not recorded. CockroachDB instead divides time into contiguous slots of uniform length (currently 10 seconds); if multiple data points for a series fall in the same slot, only the most recent sample is kept.

Slab Storage

In order to use key space efficiently, we pack data for multiple contiguous samples into "slab" values, with data for each slab stored in a CockroachDB key. This is done by again dividing time into contiguous slots, but with a longer duration; this is known as the "slab duration". For example, CockroachDB downsamples its internal data at a resolution of 10 seconds, but stores it with a "slab duration" of 1 hour, meaning that all samples that fall in the same hour are stored at the same key. This strategy helps reduce the number of keys scanned during a query.

Source Keys

Another common use case of time series queries is the aggregation of multiple series; for example, you may want to query the same metric (e.g. "queries per second") across multiple machines on a cluster, and aggregate the result.

Specialized Time-series databases can often aggregate across arbitrary series; however, CockroachDB is specialized for aggregation of the same series across different machines or disks.

This is done by creating a "source key", typically a node or store ID, which is an optional identifier that is separate from the series name itself. The source key is are appended to the key as a suffix, after the series name and timestamp; this means that data that is from the same series and time period, but from different nodes, will be stored contiguously in the key space. Data from all sources in a series can thus be queried in a single scan.

Unused Feature: Multiple resolutions

CockroachDB time series database has rudimentary support for a planned feature: recording the same series at multiple sample durations, commonly known as a "rollup".

For example, a single series may be recorded with a sample size of 10 seconds, but also record the same data with a sample size of 1 hour. The 1 hour data will have much less information, but can be queried much faster; this is very useful when querying a series over a very long period of time (e.g. an entire month or year).

A specific sample duration in CockroachDB is known as a Resolution. CockroachDB supports a fixed set of Resolutions; each Resolution has a fixed sample duration and a slab duration. For example, the resolution "Resolution10s" has a sample duration of 10 seconds and a slab duration of 1 hour.

This feature was planned and slightly informs our key structure (resolution information is encoded in every time series key); however, all time series in CockroachDB are currently recorded at a downsample duration of 10 seconds, and a slab duration of 1 hour.

Example

A hypothetical example from CockroachDB: we want to record the available capacity of all stores in the cluster.

The series name is: cockroach.capacity.available

Data points for this series are automatically collected from all stores. When data points are written, they are recorded with a source key of: [store id]

There are 3 stores which contain data: 1, 2 and 3. These are arbitrary and may change over time.

Data is recorded for January 1st, 2016 between 10:05 pm and 11:05 pm. The data is recorded at a 10 second resolution.

The data is recorded into keys structurally similar to the following:

tsd.cockroach.capacity.available.10s.403234.1
tsd.cockroach.capacity.available.10s.403234.2
tsd.cockroach.capacity.available.10s.403234.3
tsd.cockroach.capacity.available.10s.403235.1
tsd.cockroach.capacity.available.10s.403235.2
tsd.cockroach.capacity.available.10s.403235.3

Data for each source is stored in two keys: one for the 10 pm hour, and one for the 11pm hour. Each key contains the tsd prefix, the series name, the resolution (10s), a timestamp representing the hour, and finally the series key. The keys will appear in the data store in the order shown above.

(Note that the keys will NOT be exactly as pictured above; they will be encoded in a way that is more efficient, but is not readily human readable.)

Index

Constants

View Source
const (
	// URLPrefix is the prefix for all time series endpoints hosted by the
	// server.
	URLPrefix = "/ts/"
)

Variables

This section is empty.

Functions

func MakeDataKey

func MakeDataKey(name string, source string, r Resolution, timestamp int64) roachpb.Key

MakeDataKey creates a time series data key for the given series name, source, Resolution and timestamp. The timestamp is expressed in nanoseconds since the epoch; it will be truncated to an exact multiple of the supplied Resolution's KeyDuration.

Types

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB provides Cockroach's Time Series API.

func NewDB

func NewDB(db *client.DB) *DB

NewDB creates a new DB instance.

func (*DB) ContainsTimeSeries

func (tsdb *DB) ContainsTimeSeries(start, end roachpb.RKey) bool

ContainsTimeSeries returns true if the given key range overlaps the range of possible time series keys.

func (*DB) PollSource

func (db *DB) PollSource(
	ambient log.AmbientContext,
	source DataSource,
	frequency time.Duration,
	r Resolution,
	stopper *stop.Stopper,
)

PollSource begins a Goroutine which periodically queries the supplied DataSource for time series data, storing the returned data in the server. Stored data will be sampled using the provided Resolution. The polling process will continue until the provided stop.Stopper is stopped.

func (*DB) PruneTimeSeries

func (tsdb *DB) PruneTimeSeries(
	ctx context.Context,
	snapshot engine.Reader,
	start, end roachpb.RKey,
	db *client.DB,
	timestamp hlc.Timestamp,
) error

PruneTimeSeries prunes old data for any time series found in the supplied key range.

The snapshot should be supplied by a local store, and is used only to discover the names of time series which are store in that snapshot. The KV client is then used to prune old data from the discovered series.

The snapshot is used for key discovery (as opposed to the KV client) because the task of pruning time series is distributed across the cluster to the individual ranges which contain that time series data. Because replicas of those ranges are guaranteed to have time series data locally, we can use the snapshot to quickly obtain a set of keys to be pruned with no network calls.

func (*DB) Query

func (db *DB) Query(
	ctx context.Context,
	query tspb.Query,
	queryResolution Resolution,
	sampleDuration, startNanos, endNanos int64,
) ([]tspb.TimeSeriesDatapoint, []string, error)

Query returns datapoints for the named time series during the supplied time span. Data is returned as a series of consecutive data points.

Raw data is queried only at the queryResolution supplied: if data for the named time series is not stored at the given resolution, an empty result will be returned.

Raw data is converted into query results through a number of processing steps, which are executed in the following order:

1. Downsampling 2. Rate calculation (if requested) 3. Interpolation and Aggregation

Raw data stored on the server is already downsampled into samples with interval length queryResolution.SampleDuration(); however, Result data can be further downsampled into a longer sample intervals based on a provided sampleDuration. sampleDuration must have a sample duration which is a positive integer multiple of the queryResolution's sample duration. The downsampling operation can compute a sum, total, max or min. Each downsampled datapoint's timestamp falls in the middle of the sample period it represents.

After downsampling, values can be converted into a rate if requested by the query. Each data point's value is replaced by the derivative of the series at that timestamp, computed by comparing the datapoint to its predecessor. If a query requests a derivative, the returned value for each datapoint is expressed in units per second.

If data for the named time series was collected from multiple sources, each returned datapoint will represent the sum of datapoints from all sources at the same time. The returned string slices contains a list of all sources for the metric which were aggregated to produce the result. In the case where one series is missing a data point that is present in other series, the missing data points for that series will be interpolated using linear interpolation.

func (*DB) StoreData

func (db *DB) StoreData(ctx context.Context, r Resolution, data []tspb.TimeSeriesData) error

StoreData writes the supplied time series data to the cockroach server. Stored data will be sampled at the supplied resolution.

type DataSource

type DataSource interface {
	GetTimeSeriesData() []tspb.TimeSeriesData
}

A DataSource can be queryied for a slice of time series data.

type Resolution

type Resolution int64

Resolution is used to enumerate the different resolution values supported by Cockroach.

const (
	// Resolution10s stores data with a sample resolution of 10 seconds.
	Resolution10s Resolution = 1
)

Resolution enumeration values are directly serialized and persisted into system keys; these values must never be altered or reordered.

func DecodeDataKey

func DecodeDataKey(key roachpb.Key) (string, string, Resolution, int64, error)

DecodeDataKey decodes a time series key into its components.

func (Resolution) PruneThreshold

func (r Resolution) PruneThreshold() int64

PruneThreshold returns the pruning threshold duration for this resolution, expressed in nanoseconds. This duration determines how old time series data must be before it is eligible for pruning.

func (Resolution) SampleDuration

func (r Resolution) SampleDuration() int64

SampleDuration returns the sample duration corresponding to this resolution value, expressed in nanoseconds.

func (Resolution) SlabDuration

func (r Resolution) SlabDuration() int64

SlabDuration returns the slab duration corresponding to this resolution value, expressed in nanoseconds. The slab duration determines how many consecutive samples are stored in a single Cockroach key/value.

func (Resolution) String

func (r Resolution) String() string

type Server

type Server struct {
	log.AmbientContext
	// contains filtered or unexported fields
}

Server handles incoming external requests related to time series data.

func MakeServer

func MakeServer(
	ambient log.AmbientContext, db *DB, cfg ServerConfig, stopper *stop.Stopper,
) Server

MakeServer instantiates a new Server which services requests with data from the supplied DB.

func (*Server) Query

Query is an endpoint that returns data for one or more metrics over a specific time span.

func (*Server) RegisterGateway

func (s *Server) RegisterGateway(
	ctx context.Context, mux *gwruntime.ServeMux, conn *grpc.ClientConn,
) error

RegisterGateway starts the gateway (i.e. reverse proxy) that proxies HTTP requests to the appropriate gRPC endpoints.

func (*Server) RegisterService

func (s *Server) RegisterService(g *grpc.Server)

RegisterService registers the GRPC service.

type ServerConfig

type ServerConfig struct {
	// The maximum number of query workers used by the server. If this
	// value is zero, a default non-zero value is used instead.
	QueryWorkerMax int
}

ServerConfig provides a means for tests to override settings in the time series server.

Directories

Path Synopsis
Package tspb is a generated protocol buffer package.
Package tspb is a generated protocol buffer package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL