blob

package
v0.26.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2023 License: Apache-2.0 Imports: 28 Imported by: 0

README

runtime/blob

Package blob provides a way to download a batch of files ingested from remote sources like s3/gcs using google's go cdk (https://pkg.go.dev/gocloud.dev) as per user's glob pattern.

How many files are downloaded and how much data from a file is downloaded is controlled by runtimev1.Source_ExtractPolicy It also has support for ingesting partial files for some formats like parquet, unzipped csv/txt/tsv files.

It uses a planner to implement strategies for downloading.

A planner has a container which keeps track of files to be downloaded and a rowplanner which plans how much data per file needs to be downloaded.

For partial parquet file ingestion it uses apache arrow for go : https://github.com/apache/arrow/tree/master/go

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewIterator added in v0.21.0

func NewIterator(ctx context.Context, bucket *blob.Bucket, opts Options, l *zap.Logger) (connectors.FileIterator, error)

NewIterator returns new instance of blobIterator the iterator keeps list of blob objects eagerly planned as per user's glob pattern and extract policies clients should call close once done to release all resources like closing the bucket the iterator takes responsibility of closing the bucket

Types

type ObjectReader added in v0.21.0

type ObjectReader struct {
	// contains filtered or unexported fields
}

ObjectReader reads range of bytes from cloud objects implements io.ReaderAt and io.Seeker interfaces

func NewBlobObjectReader added in v0.21.0

func NewBlobObjectReader(ctx context.Context, bucket *blob.Bucket, obj *blob.ListObject) *ObjectReader

NewBlobObjectReader returns new instance of ObjectReader

func (*ObjectReader) Close added in v0.21.0

func (f *ObjectReader) Close() error

Close frees up resources if any clients should call Close once done with reader

func (*ObjectReader) Read added in v0.21.0

func (f *ObjectReader) Read(p []byte) (int, error)

Read implements io.Reader interface

func (*ObjectReader) ReadAt added in v0.21.0

func (f *ObjectReader) ReadAt(p []byte, off int64) (int, error)

ReadAt implements io.ReaderAt interface

func (*ObjectReader) Seek added in v0.21.0

func (f *ObjectReader) Seek(offset int64, whence int) (int64, error)

Seek implements io.Seeker interface

func (*ObjectReader) Size added in v0.21.0

func (f *ObjectReader) Size() int64

Size returns size of the object

type Options added in v0.21.0

type Options struct {
	GlobMaxTotalSize      int64
	GlobMaxObjectsMatched int
	GlobMaxObjectsListed  int64
	GlobPageSize          int
	ExtractPolicy         *runtimev1.Source_ExtractPolicy
	GlobPattern           string
	// Although at this point GlobMaxTotalSize and StorageLimitInBytes have same impl but
	// this is total size the source should consume on disk and is calculated upstream basis how much data one instance has already consumed
	// across other sources and the instance level limits
	StorageLimitInBytes int64
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL