README
¶
The duckdb
driver has a number of complexities that can be removed as we change our logic or duckDB evolves:
duckDB
runs into a number of issues on concurrent reads and writes like thewal
file explodes. To solve this we write every table in a different.db
file and attach it intomain
db. Relevant issue : https://github.com/duckdb/duckdb/issues/9150 (Note : its not fully fixed as of writing). We call this asexternal table storage
.duckDB
doesn't free storage when table is dropped. We see that it is also not able to re-use this entire space leading to ever increasing db file size due to source refreshes. The above fix also solve this issue since every new table is created in a new file.duckDB
sometimes can run into internal errors after which every new query fails. So we need to reopen db handles when this happens. CheckreopenDB()
inruntime/drivers/duckdb/duckdb.go
.varchar
columns can take up more space and are inefficient for querying due to duckDB's lightweight compression. If the cardinality of such columns is low, we can convert them intoenum
to improve performance. More details in this notion doc : https://www.notion.so/rilldata/Converting-low-cardinality-VARCHAR-dimensions-to-ENUMs-a07ca0a26bca4338a6f941c2604e9f62?pvs=4duckDB
views have somewhat unusual behaviour if using*
in the view definition and order of the columns in the underlying table changes. ReferTest_connection_ChangingOrder
inruntime/drivers/duckdb/olap_crud_test.go
for an example. To mitigate this we expand*
to include all columns in sorted order in the view. RefergenerateSelectQuery
inruntime/drivers/duckdb/olap.go
. Since we are changing order of the columns we also enable it just for cloud since users can be interested in original order while doing modelling locally.- We use
allow_host_access
as a proxy to check if its local or cloud which is a hack we would like to remove in future.
Few others are also listed in comments in the code.
Documentation
¶
Index ¶
- func NewFileStoreToDuckDB(from drivers.FileStore, to drivers.OLAPStore, logger *zap.Logger) drivers.Transporter
- func NewObjectStoreToDuckDB(from drivers.ObjectStore, to drivers.OLAPStore, logger *zap.Logger) drivers.Transporter
- func NewWarehouseToDuckDB(from drivers.Warehouse, to drivers.OLAPStore, logger *zap.Logger) drivers.Transporter
- func RowsToSchema(r *sqlx.Rows) (*runtimev1.StructType, error)
- type Driver
- func (d Driver) HasAnonymousSourceAccess(ctx context.Context, src map[string]any, logger *zap.Logger) (bool, error)
- func (d Driver) Open(instanceID string, cfgMap map[string]any, st *storage.Client, ...) (drivers.Handle, error)
- func (d Driver) Spec() drivers.Spec
- func (d Driver) TertiarySourceConnectors(ctx context.Context, src map[string]any, logger *zap.Logger) ([]string, error)
- type ModelInputProperties
- type ModelOutputProperties
- type ModelResultProperties
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewFileStoreToDuckDB ¶ added in v0.38.0
func NewObjectStoreToDuckDB ¶ added in v0.38.0
func NewObjectStoreToDuckDB(from drivers.ObjectStore, to drivers.OLAPStore, logger *zap.Logger) drivers.Transporter
func NewWarehouseToDuckDB ¶ added in v0.48.0
func RowsToSchema ¶ added in v0.39.0
func RowsToSchema(r *sqlx.Rows) (*runtimev1.StructType, error)
Types ¶
type Driver ¶
type Driver struct {
// contains filtered or unexported fields
}
func (Driver) HasAnonymousSourceAccess ¶ added in v0.30.0
type ModelInputProperties ¶ added in v0.45.0
type ModelInputProperties struct { SQL string `mapstructure:"sql"` Args []any `mapstructure:"args"` PreExec string `mapstructure:"pre_exec"` PostExec string `mapstructure:"post_exec"` }
func (*ModelInputProperties) Validate ¶ added in v0.45.0
func (p *ModelInputProperties) Validate() error
type ModelOutputProperties ¶ added in v0.45.0
type ModelOutputProperties struct { Table string `mapstructure:"table"` Materialize *bool `mapstructure:"materialize"` UniqueKey []string `mapstructure:"unique_key"` IncrementalStrategy drivers.IncrementalStrategy `mapstructure:"incremental_strategy"` }
func (*ModelOutputProperties) Validate ¶ added in v0.45.0
func (p *ModelOutputProperties) Validate(opts *drivers.ModelExecuteOptions) error
type ModelResultProperties ¶ added in v0.45.0
Source Files
¶
- catalogv2.go
- config.go
- context.go
- duckdb.go
- information_schema.go
- migrate.go
- model_executor_https_self.go
- model_executor_localfile_self.go
- model_executor_self.go
- model_executor_self_file.go
- model_executor_sqlstore_self.go
- model_executor_warehouse_self.go
- model_manager.go
- olap.go
- transporter_duckDB_to_duckDB.go
- transporter_filestore_to_duckDB.go
- transporter_motherduck_to_duckDB.go
- transporter_objectStore_to_duckDB.go
- transporter_warehouse_to_duckDB.go
- utils.go
Click to show internal directories.
Click to hide internal directories.