Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewIterator ¶
func NewIterator(ctx context.Context, bucket *blob.Bucket, opts Options, l *zap.Logger) (drivers.FileIterator, error)
NewIterator returns an iterator for downloading objects matching a glob pattern and extract policy. The downloaded objects will be stored in a temporary directory with the same file hierarchy as in the bucket, enabling parsing of hive partitioning on the downloaded files. The client should call Close() once done to release all resources. Calling Close() on the iterator will also close the bucket.
Types ¶
type Bucket ¶ added in v0.48.0
type Bucket struct {
// contains filtered or unexported fields
}
Bucket wraps a blob.Bucket with functionality for implementing the drivers.ObjectStore interface. NOTE: It currently only supports listing objects, but eventually we should refactor NewIterator to a member function of this struct.
func NewBucket ¶ added in v0.48.0
NewBucket wraps a *blob.Bucket. It takes ownership of the bucket and will close it when Close is called.
func (*Bucket) ListObjects ¶ added in v0.48.0
ListObjects lists objects in the bucket that match the given glob pattern. The glob pattern should be a valid path *without* scheme or bucket name. E.g. to list gs://my-bucket/path/to/files/*, the glob pattern should be "path/to/files/*".
type ExtractPolicy ¶ added in v0.34.0
type ExtractPolicy struct { RowsStrategy ExtractPolicyStrategy RowsLimitBytes uint64 FilesStrategy ExtractPolicyStrategy FilesLimit uint64 }
func ParseExtractPolicy ¶ added in v0.34.0
func ParseExtractPolicy(cfg map[string]any) (*ExtractPolicy, error)
type ExtractPolicyStrategy ¶ added in v0.34.0
type ExtractPolicyStrategy int
const ( ExtractPolicyStrategyUnspecified ExtractPolicyStrategy = 0 ExtractPolicyStrategyHead ExtractPolicyStrategy = 1 ExtractPolicyStrategyTail ExtractPolicyStrategy = 2 )
func (ExtractPolicyStrategy) String ¶ added in v0.34.0
func (s ExtractPolicyStrategy) String() string
type ObjectReader ¶
type ObjectReader struct {
// contains filtered or unexported fields
}
ObjectReader reads range of bytes from cloud objects implements io.ReaderAt and io.Seeker interfaces
func NewBlobObjectReader ¶
func NewBlobObjectReader(ctx context.Context, bucket *blob.Bucket, obj *blob.ListObject) *ObjectReader
NewBlobObjectReader returns new instance of ObjectReader
func (*ObjectReader) Close ¶
func (f *ObjectReader) Close() error
Close frees up resources if any clients should call Close once done with reader
func (*ObjectReader) Read ¶
func (f *ObjectReader) Read(p []byte) (int, error)
Read implements io.Reader interface
func (*ObjectReader) ReadAt ¶
func (f *ObjectReader) ReadAt(p []byte, off int64) (int, error)
ReadAt implements io.ReaderAt interface
type Options ¶
type Options struct { GlobMaxTotalSize int64 GlobMaxObjectsMatched int GlobMaxObjectsListed int64 GlobPageSize int ExtractPolicy *ExtractPolicy GlobPattern string // Although at this point GlobMaxTotalSize and StorageLimitInBytes have same impl but // this is total size the source should consume on disk and is calculated upstream basis how much data one instance has already consumed // across other sources and the instance level limits StorageLimitInBytes int64 // Retain files and only delete during close KeepFilesUntilClose bool // Retainfiles retains files for debugging purposes RetainFiles bool // BatchSizeBytes is the combined size of all files returned in one call to next() BatchSizeBytes int64 // General blob format (json, csv, parquet, etc) Format string // TempDir where temporary files should be stored TempDir string }