Documentation
¶
Overview ¶
Package s3 provides a DataSource which reads data from AWS s3
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CreateDataFrame ¶
func CreateDataFrame(conf *DataSourceConf, parser sif.DataSourceParser, schema sif.Schema) sif.DataFrame
CreateDataFrame is a factory for DataSources
Types ¶
type DataSource ¶
type DataSource struct {
// contains filtered or unexported fields
}
DataSource is a set of files in an s3 bucket, containing data which will be manipulating according to a DataFrame
func (*DataSource) Analyze ¶
func (fs *DataSource) Analyze() (sif.PartitionMap, error)
Analyze returns a PartitionMap, describing how the source file will be divided into Partitions
func (*DataSource) DeserializeLoader ¶
func (fs *DataSource) DeserializeLoader(bytes []byte) (sif.PartitionLoader, error)
DeserializeLoader creates a PartitionLoader for this DataSource from a serialized representation
func (*DataSource) IsStreaming ¶
func (fs *DataSource) IsStreaming() bool
IsStreaming returns true iff this DataSource provides a continuous stream of data
type DataSourceConf ¶
type DataSourceConf struct { Bucket string // Prefix limits the response to keys prefixed by this string Prefix string Filter *regexp.Regexp RequestPayer string // KeyBatchSize must be less than 1000 and represents the number of documents which will // be assigned as a batch to a Sif worker at one time. Files are assigned in batches // so that workers can download and parse files concurrently. KeyBatchSize int64 // PrefetchLimit is a limit on the number of files which workers will prefetch and store in memory PrefetchLimit int Session *session.Session Decoder func([]byte) ([]byte, error) }
DataSourceConf configures a file DataSource
type PartitionLoader ¶
type PartitionLoader struct {
// contains filtered or unexported fields
}
PartitionLoader is capable of loading partitions of data from a file
func (*PartitionLoader) GobDecode ¶
func (pl *PartitionLoader) GobDecode(in []byte) error
GobDecode deserializes a PartitionLoader
func (*PartitionLoader) GobEncode ¶
func (pl *PartitionLoader) GobEncode() ([]byte, error)
GobEncode serializes a PartitionLoader
func (*PartitionLoader) Load ¶
func (pl *PartitionLoader) Load(parser sif.DataSourceParser, widestInitialSchema sif.Schema) (sif.PartitionIterator, error)
Load is capable of loading partitions of data from a file
func (*PartitionLoader) ToString ¶
func (pl *PartitionLoader) ToString() string
ToString returns a string representation of this PartitionLoader
type PartitionMap ¶
type PartitionMap struct {
// contains filtered or unexported fields
}
PartitionMap is an iterator producing a sequence of PartitionLoaders
func (*PartitionMap) HasNext ¶
func (pm *PartitionMap) HasNext() bool
HasNext returns true iff there is another PartitionLoader remaining
func (*PartitionMap) Next ¶
func (pm *PartitionMap) Next() sif.PartitionLoader
Next returns the next PartitionLoader for a file