rduckdb

package
v0.52.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2024 License: Apache-2.0 Imports: 28 Imported by: 0

README

rduckdb

Motivation

  1. As an embedded database, DuckDB does not inherently provide the same isolation for ETL and serving workloads that other OLAP databases offer.
  2. We have observed significant degradation in query performance during data ingestion.
  3. In a Kubernetes environment, it is recommended to use local disks instead of network disks, necessitating separate local disk backups.

Features

  1. Utilizes separate DuckDB handles for reading and writing, each with distinct CPU and memory resources.
  2. Automatically backs up writes to GCS in real-time.
  3. Automatically restores from backups when starting with an empty local disk.

Examples

  1. Refer to examples/main.go for a usage example.

Future Work

  1. Enable writes and reads to be executed on separate machines.
  2. Limit read operations to specific tables to support ephemeral tables (intermediate tables required only for writes).

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CreateTableOptions

type CreateTableOptions struct {
	// View specifies whether the created table is a view.
	View bool
	// If BeforeCreateFn is set, it will be executed before the create query is executed.
	BeforeCreateFn func(ctx context.Context, conn *sqlx.Conn) error
	// If AfterCreateFn is set, it will be executed after the create query is executed.
	AfterCreateFn func(ctx context.Context, conn *sqlx.Conn) error
}

type DB

type DB interface {
	// Close closes the database.
	Close() error

	// AcquireReadConnection returns a connection to the database for reading.
	// Once done the connection should be released by calling the release function.
	// This connection must only be used for select queries or for creating and working with temporary tables.
	AcquireReadConnection(ctx context.Context) (conn *sqlx.Conn, release func() error, err error)

	// Size returns the size of the database in bytes.
	// It is currently implemented as sum of the size of all serving `.db` files.
	Size() int64

	// CreateTableAsSelect creates a new table by name from the results of the given SQL query.
	CreateTableAsSelect(ctx context.Context, name string, sql string, opts *CreateTableOptions) error

	// MutateTable allows mutating a table in the database by calling the mutateFn.
	MutateTable(ctx context.Context, name string, mutateFn func(ctx context.Context, conn *sqlx.Conn) error) error

	// DropTable removes a table from the database.
	DropTable(ctx context.Context, name string) error

	// RenameTable renames a table in the database.
	RenameTable(ctx context.Context, oldName, newName string) error
}

func NewDB

func NewDB(ctx context.Context, opts *DBOptions) (DB, error)

NewDB creates a new DB instance. dbIdentifier is a unique identifier for the database reported in metrics.

type DBOptions

type DBOptions struct {
	// LocalPath is the path where local db files will be stored. Should be unique for each database.
	LocalPath string
	// Remote is the blob storage bucket where the database files will be stored. This is the source of truth.
	// The local db will be eventually synced with the remote.
	Remote *blob.Bucket

	// ReadSettings are settings applied the read duckDB handle.
	ReadSettings map[string]string
	// WriteSettings are settings applied the write duckDB handle.
	WriteSettings map[string]string
	// InitQueries are the queries to run when the database is first created.
	InitQueries []string

	Logger         *slog.Logger
	OtelAttributes []attribute.KeyValue
}

func (*DBOptions) ValidateSettings

func (d *DBOptions) ValidateSettings() error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL