file

package
v0.0.0-...-79c606f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2022 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package file provides a DataSource which reads data from a directory of files on disk. Files are assigned to workers in their entirety, so it is favourable if individual files represent roughly equal-sized divisions of data.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CreateDataFrame

func CreateDataFrame(conf *DataSourceConf, parser sif.DataSourceParser, schema sif.Schema) sif.DataFrame

CreateDataFrame is a factory for DataSources

Types

type DataSource

type DataSource struct {
	// contains filtered or unexported fields
}

DataSource is a set of files containing data which will be manipulating according to a DataFrame

func (*DataSource) Analyze

func (fs *DataSource) Analyze() (sif.PartitionMap, error)

Analyze returns a PartitionMap, describing how the source file will be divided into Partitions

func (*DataSource) DeserializeLoader

func (fs *DataSource) DeserializeLoader(bytes []byte) (sif.PartitionLoader, error)

DeserializeLoader creates a PartitionLoader for this DataSource from a serialized representation

func (*DataSource) IsStreaming

func (fs *DataSource) IsStreaming() bool

IsStreaming returns true iff this DataSource provides a continuous stream of data

type DataSourceConf

type DataSourceConf struct {
	Glob    string
	Decoder func([]byte) ([]byte, error)
}

DataSourceConf configures a file DataSource

type PartitionLoader

type PartitionLoader struct {
	// contains filtered or unexported fields
}

PartitionLoader is capable of loading partitions of data from a file

func (*PartitionLoader) GobDecode

func (pl *PartitionLoader) GobDecode(in []byte) error

GobDecode deserializes a PartitionLoader

func (*PartitionLoader) GobEncode

func (pl *PartitionLoader) GobEncode() ([]byte, error)

GobEncode serializes a PartitionLoader

func (*PartitionLoader) Load

Load is capable of loading partitions of data from a file

func (*PartitionLoader) ToString

func (pl *PartitionLoader) ToString() string

ToString returns a string representation of this PartitionLoader

type PartitionMap

type PartitionMap struct {
	// contains filtered or unexported fields
}

PartitionMap is an iterator producing a sequence of PartitionLoaders

func (*PartitionMap) HasNext

func (pm *PartitionMap) HasNext() bool

HasNext returns true iff there is another PartitionLoader remaining

func (*PartitionMap) Next

func (pm *PartitionMap) Next() sif.PartitionLoader

Next returns the next PartitionLoader for a file

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL