Documentation ΒΆ
Overview ΒΆ
Package bodkin is a Go library for generating schemas and decoding generic map values and native Go structures to Apache Arrow. The goal is to provide a useful toolkit to make it easier to use Arrow, and by extension Parquet with data whose shape is evolving or not strictly defined.
Index ΒΆ
- Variables
- type Bodkin
- func (u *Bodkin) Changes() error
- func (u *Bodkin) Count() int
- func (u *Bodkin) CountPaths() int
- func (u *Bodkin) CountPending() int
- func (u *Bodkin) Err() []Field
- func (u *Bodkin) ExportSchemaBytes() ([]byte, error)
- func (u *Bodkin) ExportSchemaFile(exportPath string) error
- func (u *Bodkin) ImportSchemaBytes(dat []byte) (*arrow.Schema, error)
- func (u *Bodkin) ImportSchemaFile(importPath string) (*arrow.Schema, error)
- func (u *Bodkin) LastSchema() (*arrow.Schema, error)
- func (u *Bodkin) MaxCount() int
- func (u *Bodkin) NewReader(opts ...reader.Option) (*reader.DataReader, error)
- func (u *Bodkin) Opts() []Option
- func (u *Bodkin) OriginSchema() (*arrow.Schema, error)
- func (u *Bodkin) Paths() []Field
- func (u *Bodkin) ResetCount() int
- func (u *Bodkin) ResetMaxCount() int
- func (u *Bodkin) Schema() (*arrow.Schema, error)
- func (u *Bodkin) Unify(a any) error
- func (u *Bodkin) UnifyAtPath(a any, mergeAt string) error
- func (u *Bodkin) UnifyScan() error
- type Field
- type Option
Constants ΒΆ
This section is empty.
Variables ΒΆ
var ( ErrUndefinedInput = errors.New("nil input") ErrInvalidInput = errors.New("invalid input") ErrNoLatestSchema = errors.New("no second input has been provided") ErrUndefinedFieldType = errors.New("could not determine type of unpopulated field") ErrUndefinedArrayElementType = errors.New("could not determine element type of empty array") ErrNotAnUpgradableType = errors.New("is not an upgradable type") ErrPathNotFound = errors.New("path not found") ErrFieldTypeChanged = errors.New("changed") ErrFieldAdded = errors.New("added") )
Schema evaluation/evolution errors.
var UpgradableTypes []arrow.Type = []arrow.Type{arrow.INT8, arrow.UINT8, arrow.INT16, arrow.UINT16, arrow.INT32, arrow.UINT64, arrow.INT64, arrow.FLOAT16, arrow.FLOAT32, arrow.FLOAT64, arrow.DATE32, arrow.TIME64, arrow.TIMESTAMP, }
UpgradableTypes are scalar types that can be upgraded to a more flexible type.
Functions ΒΆ
This section is empty.
Types ΒΆ
type Bodkin ΒΆ
type Bodkin struct { Reader *reader.DataReader // contains filtered or unexported fields }
Bodkin is a collection of field paths, describing the columns of a structured input(s).
func NewBodkin ΒΆ
NewBodkin returns a new Bodkin value from a structured input. Input must be a json byte slice or string, a Go struct with exported fields or map[string]any. Any unpopulated fields, empty objects or empty slices in JSON or map[string]any inputs are skipped as their types cannot be evaluated and converted.
func (*Bodkin) Changes ΒΆ
Changes returns a list of field additions and field type conversions done in the lifetime of the Bodkin object.
func (*Bodkin) Count ΒΆ added in v0.2.0
Count returns the number of datum evaluated for schema to date.
func (*Bodkin) CountPaths ΒΆ added in v0.2.0
Returns count of evaluated field paths.
func (*Bodkin) CountPending ΒΆ added in v0.2.0
Returns count of unevaluated field paths.
func (*Bodkin) ExportSchemaBytes ΒΆ added in v0.2.5
ExportSchemaBytes exports a serialized Arrow Schema.
func (*Bodkin) ExportSchemaFile ΒΆ added in v0.2.5
ExportSchema exports a serialized Arrow Schema to a file.
func (*Bodkin) ImportSchemaBytes ΒΆ added in v0.2.5
ImportSchemaBytes imports a serialized Arrow Schema.
func (*Bodkin) ImportSchemaFile ΒΆ added in v0.2.5
ImportSchema imports a serialized Arrow Schema from a file.
func (*Bodkin) LastSchema ΒΆ
LastSchema returns the Arrow schema generated from the structure/types of the most recent input. Any unpopulated fields, empty objects or empty slices are skipped. ErrNoLatestSchema if Unify() has never been called. A panic recovery error is returned if the schema could not be created.
func (*Bodkin) MaxCount ΒΆ added in v0.2.4
MaxCount returns the maximum number of datum to be evaluated for schema.
func (*Bodkin) OriginSchema ΒΆ
Schema returns the original Arrow schema generated from the structure/types of the initial input, and a panic recovery error if the schema could not be created.
func (*Bodkin) Paths ΒΆ added in v0.2.0
Paths returns a slice of dotpaths of fields successfully evaluated to date.
func (*Bodkin) ResetCount ΒΆ added in v0.2.4
ResetCount resets the count of datum evaluated for schema to date.
func (*Bodkin) ResetMaxCount ΒΆ added in v0.2.4
ResetMaxCount resets the maximum number of datam to be evaluated for schema to maxInt64. ResetCount resets the count of datum evaluated for schema to date.
func (*Bodkin) Schema ΒΆ
Schema returns the current merged Arrow schema generated from the structure/types of the input(s), and a panic recovery error if the schema could not be created. If the Bodkin has a Reader and the schema has been updated since its creation, the Reader will replaced with a new one matching the current schema. Any
func (*Bodkin) Unify ΒΆ
Unify merges structured input's column definition with the previously input's schema. Any unpopulated fields, empty objects or empty slices in JSON input are skipped.
func (*Bodkin) UnifyAtPath ΒΆ added in v0.2.0
Unify merges structured input's column definition with the previously input's schema, using a specified valid path as the root. An error is returned if the mergeAt path is not found. Any unpopulated fields, empty objects or empty slices in JSON input are skipped.
type Field ΒΆ added in v0.2.0
type Field struct { Dotpath string `json:"dotpath"` Type arrow.Type `json:"arrow_type"` // Number of child fields if a nested type Childen int `json:"children,omitempty"` // Evaluation failure reason Issue error `json:"issue,omitempty"` }
Field represents an element in the input data.
type Option ΒΆ
type Option func(config)
Option configures a Bodkin
func WithIOReader ΒΆ added in v0.3.0
WithIOReader provides an io.Reader for a Bodkin to use with UnifyScan(), along with a delimiter to use to split datum in the data stream. Default delimiter '\n' if delimiter is not provided.
func WithInferTimeUnits ΒΆ
func WithInferTimeUnits() Option
WithInferTimeUnits() enables scanning input string values for time, date and timestamp types.
Times use a format of HH:MM or HH:MM:SS[.zzz] where the fractions of a second cannot exceed the precision allowed by the time unit, otherwise unmarshalling will error.
Dates use YYYY-MM-DD format.
Timestamps use RFC3339Nano format except without a timezone, all of the following are valid:
YYYY-MM-DD YYYY-MM-DD[T]HH YYYY-MM-DD[T]HH:MM YYYY-MM-DD[T]HH:MM:SS[.zzzzzzzzzz]
func WithMaxCount ΒΆ added in v0.2.4
WithMaxCount enables capping the number of Unify evaluations.
func WithQuotedValuesAreStrings ΒΆ added in v0.1.2
func WithQuotedValuesAreStrings() Option
WithTypeConversion enables upgrading the column types to fix compatibilty conflicts.
func WithTypeConversion ΒΆ
func WithTypeConversion() Option
WithTypeConversion enables upgrading the column types to fix compatibilty conflicts.
Directories ΒΆ
Path | Synopsis |
---|---|
Package reader contains helpers for reading data and loading to Arrow.
|
Package reader contains helpers for reading data and loading to Arrow. |