internal

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2024 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func MakeSequencedChan

func MakeSequencedChan[T any](bufferSize uint, source <-chan T, comesAfter, isNext func(a, b *T) bool, initial T) <-chan T

MakeSequencedChan creates a channel that outputs values in a given order based on the comesAfter and isNext functions. The values are read in from the provided source and then re-ordered before being sent to the output.

Types

type Enumerated

type Enumerated[T any] struct {
	Value T
	Index int
	Last  bool
}

Enumerated is a quick way to represent a sequenced value that can be processed in parallel and then needs to be reordered.

type FileReader

type FileReader interface {
	io.Closer

	// PrunedSchema takes in the list of projected field IDs and returns the arrow schema
	// that represents the underlying file schema with only the projected fields. It also
	// returns the indexes of the projected columns to allow reading *only* the needed
	// columns.
	PrunedSchema(projectedIDs map[int]struct{}) (*arrow.Schema, []int, error)
	// GetRecords returns a record reader for only the provided columns (using nil will read
	// all of the columns of the underlying file.) The `tester` is a function that can be used,
	// if non-nil, to filter aspects of the file such as skipping row groups in a parquet file.
	GetRecords(ctx context.Context, cols []int, tester any) (array.RecordReader, error)
	// ReadTable reads the entire file and returns it as an arrow table.
	ReadTable(context.Context) (arrow.Table, error)
}

type FileSource

type FileSource interface {
	GetReader(context.Context) (FileReader, error)
}

func GetFile

func GetFile(ctx context.Context, fs iceio.IO, dataFile iceberg.DataFile, isPosDeletes bool) (FileSource, error)

GetFile opens the given file using the provided file system.

The FileSource interface allows abstracting away the underlying file format while providing utilties to read the file as Arrow record batches.

type ParquetFileSource

type ParquetFileSource struct {
	// contains filtered or unexported fields
}

func (*ParquetFileSource) GetReader

func (pfs *ParquetFileSource) GetReader(ctx context.Context) (FileReader, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL