parquetquery

package
v0.0.0-...-39c03fc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 5, 2024 License: AGPL-3.0 Imports: 13 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CompareRowNumbers

func CompareRowNumbers(upToDefinitionLevel int, a, b RowNumber) int

CompareRowNumbers compares the sequences of row numbers in a and b for partial equality, descending from top-level through the given definition level. For example, definition level 1 means that row numbers are compared at two levels of nesting, the top-level and 1 level of nesting below.

func GetColumnIndexByPath

func GetColumnIndexByPath(pf *pq.File, s string) (index, depth int)

func HasColumn

func HasColumn(pf *pq.File, s string) bool

Types

type ColumnIterator

type ColumnIterator struct {
	// contains filtered or unexported fields
}

ColumnIterator asynchronously iterates through the given row groups and column. Applies the optional predicate to each chunk, page, and value. Results are read by calling Next() until it returns nil.

func NewColumnIterator

func NewColumnIterator(ctx context.Context, rgs []pq.RowGroup, column int, columnName string, readSize int, filter Predicate, selectAs string) *ColumnIterator

func (*ColumnIterator) Close

func (c *ColumnIterator) Close()

func (*ColumnIterator) Next

func (c *ColumnIterator) Next() (*IteratorResult, error)

Next returns the next matching value from the iterator. Returns nil when finished.

func (*ColumnIterator) SeekTo

func (c *ColumnIterator) SeekTo(to RowNumber, d int) (*IteratorResult, error)

SeekTo moves this iterator to the next result that is greater than or equal to the given row number (and based on the given definition level)

func (*ColumnIterator) String

func (c *ColumnIterator) String() string

type DictionaryPredicateHelper

type DictionaryPredicateHelper struct {
	// contains filtered or unexported fields
}

DictionaryPredicateHelper is a helper for a predicate that uses a dictionary for filtering.

There is one dictionary per ColumnChunk/RowGroup, but it is not accessible in KeepColumnChunk. This helper saves the result of KeepPage and uses it for all pages in the row group. It also has a basic heuristic for choosing not to check the dictionary at all if the cardinality is too high.

type FloatBetweenPredicate

type FloatBetweenPredicate struct {
	// contains filtered or unexported fields
}

func NewFloatBetweenPredicate

func NewFloatBetweenPredicate(min, max float64) *FloatBetweenPredicate

func (*FloatBetweenPredicate) KeepColumnChunk

func (p *FloatBetweenPredicate) KeepColumnChunk(c pq.ColumnChunk) bool

func (*FloatBetweenPredicate) KeepPage

func (p *FloatBetweenPredicate) KeepPage(page pq.Page) bool

func (*FloatBetweenPredicate) KeepValue

func (p *FloatBetweenPredicate) KeepValue(v pq.Value) bool

func (*FloatBetweenPredicate) String

func (p *FloatBetweenPredicate) String() string

type GenericPredicate

type GenericPredicate[T any] struct {
	Fn      func(T) bool
	RangeFn func(min, max T) bool
	Extract func(pq.Value) T
	// contains filtered or unexported fields
}

Generic predicate with callbacks to evalulate data of type T Fn evalulates a single data point and is required. Optionally, a RangeFn can evalulate a min/max range and is used to skip column chunks and pages when RangeFn is supplied and the column chunk or page also include bounds metadata.

func NewBoolPredicate

func NewBoolPredicate(b bool) *GenericPredicate[bool]

func NewFloatPredicate

func NewFloatPredicate(fn func(float64) bool, rangeFn func(float64, float64) bool) *GenericPredicate[float64]

func NewGenericPredicate

func NewGenericPredicate[T any](fn func(T) bool, rangeFn func(T, T) bool, extract func(pq.Value) T) *GenericPredicate[T]

func NewIntPredicate

func NewIntPredicate(fn func(int64) bool, rangeFn func(int64, int64) bool) *GenericPredicate[int64]

func (*GenericPredicate[T]) KeepColumnChunk

func (p *GenericPredicate[T]) KeepColumnChunk(c pq.ColumnChunk) bool

func (*GenericPredicate[T]) KeepPage

func (p *GenericPredicate[T]) KeepPage(page pq.Page) bool

func (*GenericPredicate[T]) KeepValue

func (p *GenericPredicate[T]) KeepValue(v pq.Value) bool

func (*GenericPredicate[T]) String

func (p *GenericPredicate[T]) String() string

type GroupPredicate

type GroupPredicate interface {
	fmt.Stringer

	KeepGroup(*IteratorResult) bool
}

type InstrumentedPredicate

type InstrumentedPredicate struct {
	InspectedColumnChunks int64
	InspectedPages        int64
	InspectedValues       int64
	KeptColumnChunks      int64
	KeptPages             int64
	KeptValues            int64
	// contains filtered or unexported fields
}

func (*InstrumentedPredicate) KeepColumnChunk

func (p *InstrumentedPredicate) KeepColumnChunk(c pq.ColumnChunk) bool

func (*InstrumentedPredicate) KeepPage

func (p *InstrumentedPredicate) KeepPage(page pq.Page) bool

func (*InstrumentedPredicate) KeepValue

func (p *InstrumentedPredicate) KeepValue(v pq.Value) bool

func (*InstrumentedPredicate) String

func (p *InstrumentedPredicate) String() string

type IntBetweenPredicate

type IntBetweenPredicate struct {
	// contains filtered or unexported fields
}

IntBetweenPredicate checks for int between the bounds [min,max] inclusive

func NewIntBetweenPredicate

func NewIntBetweenPredicate(min, max int64) *IntBetweenPredicate

func (*IntBetweenPredicate) KeepColumnChunk

func (p *IntBetweenPredicate) KeepColumnChunk(c pq.ColumnChunk) bool

func (*IntBetweenPredicate) KeepPage

func (p *IntBetweenPredicate) KeepPage(page pq.Page) bool

func (*IntBetweenPredicate) KeepValue

func (p *IntBetweenPredicate) KeepValue(v pq.Value) bool

func (*IntBetweenPredicate) String

func (p *IntBetweenPredicate) String() string

type Iterator

type Iterator interface {
	fmt.Stringer

	// Next returns nil when done
	Next() (*IteratorResult, error)

	// Like Next but skips over results until reading >= the given location
	SeekTo(t RowNumber, definitionLevel int) (*IteratorResult, error)

	Close()
}

iterator - Every iterator follows this interface and can be composed.

type IteratorResult

type IteratorResult struct {
	RowNumber RowNumber
	Entries   []struct {
		Key   string
		Value pq.Value
	}
	OtherEntries []struct {
		Key   string
		Value interface{}
	}
}

IteratorResult is a row of data with a row number and named columns of data. Internally it has an unstructured list for efficient collection. The ToMap() function can be used to make inspection easier.

func (*IteratorResult) Append

func (r *IteratorResult) Append(rr *IteratorResult)

func (*IteratorResult) AppendOtherValue

func (r *IteratorResult) AppendOtherValue(k string, v interface{})

func (*IteratorResult) AppendValue

func (r *IteratorResult) AppendValue(k string, v pq.Value)

func (*IteratorResult) Columns

func (r *IteratorResult) Columns(buffer [][]pq.Value, names ...string) [][]pq.Value

Columns gets the values for each named column. The order of returned values matches the order of names given. This is more efficient than converting to a map.

func (*IteratorResult) OtherValueFromKey

func (r *IteratorResult) OtherValueFromKey(k string) interface{}

func (*IteratorResult) Reset

func (r *IteratorResult) Reset()

func (*IteratorResult) ToMap

func (r *IteratorResult) ToMap() map[string][]pq.Value

ToMap converts the unstructured list of data into a map containing an entry for each column, and the lists of values. The order of columns is not preseved, but the order of values within each column is.

type JoinIterator

type JoinIterator struct {
	// contains filtered or unexported fields
}

JoinIterator joins two or more iterators for matches at the given definition level. I.e. joining at definitionLevel=0 means that each iterator must produce a result within the same root node.

func NewJoinIterator

func NewJoinIterator(definitionLevel int, iters []Iterator, pred GroupPredicate) *JoinIterator

func (*JoinIterator) Close

func (j *JoinIterator) Close()

func (*JoinIterator) Next

func (j *JoinIterator) Next() (*IteratorResult, error)

func (*JoinIterator) SeekTo

func (j *JoinIterator) SeekTo(t RowNumber, d int) (*IteratorResult, error)

func (*JoinIterator) String

func (j *JoinIterator) String() string

type KeyValueGroupPredicate

type KeyValueGroupPredicate struct {
	// contains filtered or unexported fields
}

KeyValueGroupPredicate takes key/value pairs and checks if the group contains all of them. This is the only predicate/iterator that is knowledgable about our snapshot or search contents. I'd like to change that and make it generic, but it's quite complex and not figured it out yet.

func NewKeyValueGroupPredicate

func NewKeyValueGroupPredicate(keys, values []string) *KeyValueGroupPredicate

func (*KeyValueGroupPredicate) KeepGroup

func (a *KeyValueGroupPredicate) KeepGroup(group *IteratorResult) bool

KeepGroup checks if the given group contains all of the requested key/value pairs.

func (*KeyValueGroupPredicate) String

func (a *KeyValueGroupPredicate) String() string

type LeftJoinIterator

type LeftJoinIterator struct {
	// contains filtered or unexported fields
}

LeftJoinIterator joins two or more iterators for matches at the given definition level. The first set of required iterators must all produce matching results. The second set of optional iterators are collected if they also match. TODO - This should technically obsolete the JoinIterator.

func NewLeftJoinIterator

func NewLeftJoinIterator(definitionLevel int, required, optional []Iterator, pred GroupPredicate) *LeftJoinIterator

func (*LeftJoinIterator) Close

func (j *LeftJoinIterator) Close()

func (*LeftJoinIterator) Next

func (j *LeftJoinIterator) Next() (*IteratorResult, error)

func (*LeftJoinIterator) SeekTo

func (j *LeftJoinIterator) SeekTo(t RowNumber, d int) (*IteratorResult, error)

func (*LeftJoinIterator) String

func (j *LeftJoinIterator) String() string

type OrPredicate

type OrPredicate struct {
	// contains filtered or unexported fields
}

func NewOrPredicate

func NewOrPredicate(preds ...Predicate) *OrPredicate

func (*OrPredicate) KeepColumnChunk

func (p *OrPredicate) KeepColumnChunk(c pq.ColumnChunk) bool

func (*OrPredicate) KeepPage

func (p *OrPredicate) KeepPage(page pq.Page) bool

func (*OrPredicate) KeepValue

func (p *OrPredicate) KeepValue(v pq.Value) bool

func (*OrPredicate) String

func (p *OrPredicate) String() string

type Predicate

type Predicate interface {
	fmt.Stringer

	KeepColumnChunk(cc pq.ColumnChunk) bool
	KeepPage(page pq.Page) bool
	KeepValue(pq.Value) bool
}

Predicate is a pushdown predicate that can be applied at the chunk, page, and value levels.

func NewStringInPredicate

func NewStringInPredicate(ss []string) Predicate

type RegexInPredicate

type RegexInPredicate struct {
	// contains filtered or unexported fields
}

RegexInPredicate checks for match against any of the given regexs. Memoized and resets on each row group.

func NewRegexInPredicate

func NewRegexInPredicate(regs []string) (*RegexInPredicate, error)

func (*RegexInPredicate) KeepColumnChunk

func (p *RegexInPredicate) KeepColumnChunk(pq.ColumnChunk) bool

func (*RegexInPredicate) KeepPage

func (p *RegexInPredicate) KeepPage(page pq.Page) bool

func (*RegexInPredicate) KeepValue

func (p *RegexInPredicate) KeepValue(v pq.Value) bool

func (*RegexInPredicate) String

func (p *RegexInPredicate) String() string

type RegexNotInPredicate

type RegexNotInPredicate struct {
	// contains filtered or unexported fields
}

RegexInPredicate checks for match against any of the given regexs. Memoized and resets on each row group.

func NewRegexNotInPredicate

func NewRegexNotInPredicate(regs []string) (*RegexNotInPredicate, error)

func (*RegexNotInPredicate) KeepColumnChunk

func (p *RegexNotInPredicate) KeepColumnChunk(pq.ColumnChunk) bool

func (*RegexNotInPredicate) KeepPage

func (p *RegexNotInPredicate) KeepPage(page pq.Page) bool

func (*RegexNotInPredicate) KeepValue

func (p *RegexNotInPredicate) KeepValue(v pq.Value) bool

func (*RegexNotInPredicate) String

func (p *RegexNotInPredicate) String() string

type RowNumber

type RowNumber [6]int64

RowNumber is the sequence of row numbers uniquely identifying a value in a tree of nested columns, starting at the top-level and including another row number for each level of nesting. -1 is a placeholder for undefined at lower levels. RowNumbers can be compared for full equality using the == operator, or can be compared partially, looking for equal lineages down to a certain level. For example given the following tree, the row numbers would be:

A          0, -1, -1
  B        0,  0, -1
  C        0,  1, -1
    D      0,  1,  0
  E        0,  2, -1

Currently supports 6 levels of nesting which should be enough for anybody. :)

func EmptyRowNumber

func EmptyRowNumber() RowNumber

EmptyRowNumber creates an empty invalid row number.

func MaxRowNumber

func MaxRowNumber() RowNumber

MaxRowNumber is a helper that represents the maximum(-ish) representable value.

func TruncateRowNumber

func TruncateRowNumber(definitionLevelToKeep int, t RowNumber) RowNumber

func (*RowNumber) Next

func (t *RowNumber) Next(repetitionLevel, definitionLevel int)

Next increments and resets the row numbers according to the given repetition and definition levels. Examples from the Dremel whitepaper: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/36632.pdf Name.Language.Country value | r | d | expected RowNumber -------|---|---|-------------------

|   |   | { -1, -1, -1, -1 }  <-- starting position

us | 0 | 3 | { 0, 0, 0, 0 } null | 2 | 2 | { 0, 0, 1, -1 } null | 1 | 1 | { 0, 1, -1, -1 } gb | 1 | 3 | { 0, 2, 0, 0 } null | 0 | 1 | { 1, 0, -1, -1 }

func (*RowNumber) Skip

func (t *RowNumber) Skip(numRows int64)

Skip rows at the root-level.

func (RowNumber) Valid

func (t RowNumber) Valid() bool

type SkipNilsPredicate

type SkipNilsPredicate struct{}

func NewSkipNilsPredicate

func NewSkipNilsPredicate() *SkipNilsPredicate

func (*SkipNilsPredicate) KeepColumnChunk

func (p *SkipNilsPredicate) KeepColumnChunk(pq.ColumnChunk) bool

func (*SkipNilsPredicate) KeepPage

func (p *SkipNilsPredicate) KeepPage(page pq.Page) bool

func (*SkipNilsPredicate) KeepValue

func (p *SkipNilsPredicate) KeepValue(v pq.Value) bool

func (*SkipNilsPredicate) String

func (p *SkipNilsPredicate) String() string

type StringInPredicate

type StringInPredicate struct {
	// contains filtered or unexported fields
}

StringInPredicate checks for any of the given strings. Case sensitive exact byte matching

func (*StringInPredicate) KeepColumnChunk

func (p *StringInPredicate) KeepColumnChunk(cc pq.ColumnChunk) bool

func (*StringInPredicate) KeepPage

func (p *StringInPredicate) KeepPage(page pq.Page) bool

func (*StringInPredicate) KeepValue

func (p *StringInPredicate) KeepValue(v pq.Value) bool

func (*StringInPredicate) String

func (p *StringInPredicate) String() string

type SubstringPredicate

type SubstringPredicate struct {
	// contains filtered or unexported fields
}

func NewSubstringPredicate

func NewSubstringPredicate(substring string) *SubstringPredicate

func (*SubstringPredicate) KeepColumnChunk

func (p *SubstringPredicate) KeepColumnChunk(pq.ColumnChunk) bool

func (*SubstringPredicate) KeepPage

func (p *SubstringPredicate) KeepPage(page pq.Page) bool

func (*SubstringPredicate) KeepValue

func (p *SubstringPredicate) KeepValue(v pq.Value) bool

func (*SubstringPredicate) String

func (p *SubstringPredicate) String() string

type UnionIterator

type UnionIterator struct {
	// contains filtered or unexported fields
}

UnionIterator produces all results for all given iterators. When iterators align to the same row, based on the configured definition level, then the results are returned together. Else the next matching iterator is returned.

func NewUnionIterator

func NewUnionIterator(definitionLevel int, iters []Iterator, pred GroupPredicate) *UnionIterator

func (*UnionIterator) Close

func (u *UnionIterator) Close()

func (*UnionIterator) Next

func (u *UnionIterator) Next() (*IteratorResult, error)

func (*UnionIterator) SeekTo

func (u *UnionIterator) SeekTo(t RowNumber, d int) (*IteratorResult, error)

func (*UnionIterator) String

func (u *UnionIterator) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL