Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewIterator ¶
func NewIterator(ctx context.Context, bucket *blob.Bucket, opts Options, l *zap.Logger) (drivers.FileIterator, error)
NewIterator returns an iterator for downloading objects matching a glob pattern and extract policy. The downloaded objects will be stored in a temporary directory with the same file hierarchy as in the bucket, enabling parsing of hive partitioning on the downloaded files. The client should call Close() once done to release all resources. Calling Close() on the iterator will also close the bucket.
Types ¶
type ExtractPolicy ¶ added in v0.34.0
type ExtractPolicy struct { RowsStrategy ExtractPolicyStrategy RowsLimitBytes uint64 FilesStrategy ExtractPolicyStrategy FilesLimit uint64 }
func ParseExtractPolicy ¶ added in v0.34.0
func ParseExtractPolicy(cfg map[string]any) (*ExtractPolicy, error)
type ExtractPolicyStrategy ¶ added in v0.34.0
type ExtractPolicyStrategy int
const ( ExtractPolicyStrategyUnspecified ExtractPolicyStrategy = 0 ExtractPolicyStrategyHead ExtractPolicyStrategy = 1 ExtractPolicyStrategyTail ExtractPolicyStrategy = 2 )
func (ExtractPolicyStrategy) String ¶ added in v0.34.0
func (s ExtractPolicyStrategy) String() string
type ObjectReader ¶
type ObjectReader struct {
// contains filtered or unexported fields
}
ObjectReader reads range of bytes from cloud objects implements io.ReaderAt and io.Seeker interfaces
func NewBlobObjectReader ¶
func NewBlobObjectReader(ctx context.Context, bucket *blob.Bucket, obj *blob.ListObject) *ObjectReader
NewBlobObjectReader returns new instance of ObjectReader
func (*ObjectReader) Close ¶
func (f *ObjectReader) Close() error
Close frees up resources if any clients should call Close once done with reader
func (*ObjectReader) Read ¶
func (f *ObjectReader) Read(p []byte) (int, error)
Read implements io.Reader interface
func (*ObjectReader) ReadAt ¶
func (f *ObjectReader) ReadAt(p []byte, off int64) (int, error)
ReadAt implements io.ReaderAt interface
type Options ¶
type Options struct { GlobMaxTotalSize int64 GlobMaxObjectsMatched int GlobMaxObjectsListed int64 GlobPageSize int ExtractPolicy *ExtractPolicy GlobPattern string // Although at this point GlobMaxTotalSize and StorageLimitInBytes have same impl but // this is total size the source should consume on disk and is calculated upstream basis how much data one instance has already consumed // across other sources and the instance level limits StorageLimitInBytes int64 // Retain files and only delete during close KeepFilesUntilClose bool // Retainfiles retains files for debugging purposes RetainFiles bool // BatchSizeBytes is the combined size of all files returned in one call to next() BatchSizeBytes int64 // General blob format (json, csv, parquet, etc) Format string // TempDir where temporary files should be stored TempDir string }