Documentation
¶
Overview ¶
Package imports provides functionality to read data contained in another format to populate a DataFrame. It provides inverse functionality to the exports package.
Index ¶
- func LoadFromCSV(ctx context.Context, r io.ReadSeeker, options ...CSVLoadOptions) (*dataframe.DataFrame, error)
- func LoadFromJSON(ctx context.Context, r io.ReadSeeker, options ...JSONLoadOptions) (*dataframe.DataFrame, error)
- func LoadFromParquet(ctx context.Context, src source.ParquetFile, opts ...ParquetLoadOptions) (*dataframe.DataFrame, error)
- func LoadFromSQL(ctx context.Context, stmt interface{}, options *SQLLoadOptions, ...) (*dataframe.DataFrame, error)
- type CSVLoadOptions
- type Converter
- type Database
- type GenericDataConverter
- type JSONLoadOptions
- type ParquetLoadOptions
- type SQLLoadOptions
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func LoadFromCSV ¶
func LoadFromCSV(ctx context.Context, r io.ReadSeeker, options ...CSVLoadOptions) (*dataframe.DataFrame, error)
LoadFromCSV will load data from a csv file.
func LoadFromJSON ¶
func LoadFromJSON(ctx context.Context, r io.ReadSeeker, options ...JSONLoadOptions) (*dataframe.DataFrame, error)
LoadFromJSON will load data from a jsonl file. The first row determines which fields will be imported for subsequent rows.
func LoadFromParquet ¶
func LoadFromParquet(ctx context.Context, src source.ParquetFile, opts ...ParquetLoadOptions) (*dataframe.DataFrame, error)
LoadFromParquet will load data from a parquet file.
NOTE: This function is experimental and the implementation is likely to change.
Example (gist):
import "github.com/xitongsys/parquet-go-source/local" import "github.com/rocketlaunchr/dataframe-go/imports" func main() { fr, _ := local.NewLocalFileReader("file.parquet") defer fr.Close() df, _ := imports.LoadFromParquet(ctx, fr) }
func LoadFromSQL ¶
func LoadFromSQL(ctx context.Context, stmt interface{}, options *SQLLoadOptions, args ...interface{}) (*dataframe.DataFrame, error)
LoadFromSQL will load data from a sql database. stmt must be a *sql.Stmt or the equivalent from the mysql-go package.
See: https://godoc.org/github.com/rocketlaunchr/mysql-go#Stmt
Types ¶
type CSVLoadOptions ¶
type CSVLoadOptions struct { // Comma is the field delimiter. // The default value is ',' when CSVLoadOption is not provided. // Comma must be a valid rune and must not be \r, \n, // or the Unicode replacement character (0xFFFD). Comma rune // Comment, if not 0, is the comment character. Lines beginning with the // Comment character without preceding whitespace are ignored. // With leading whitespace the Comment character becomes part of the // field, even if TrimLeadingSpace is true. // Comment must be a valid rune and must not be \r, \n, // or the Unicode replacement character (0xFFFD). // It must also not be equal to Comma. Comment rune // If TrimLeadingSpace is true, leading white space in a field is ignored. // This is done even if the field delimiter, Comma, is white space. TrimLeadingSpace bool // LargeDataSet should be set to true for large datasets. // It will set the capacity of the underlying slices of the Dataframe by performing a basic parse // of the full dataset before processing the data fully. // Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case. LargeDataSet bool // DictateDataType is used to inform LoadFromCSV what the true underlying data type is for a given field name. // The key must be the case-sensitive field name. // The value for a given key must be of the data type of the data. // eg. For a string use "". For a int64 use int64(0). What is relevant is the data type and not the value itself. // // NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work. DictateDataType map[string]interface{} // NilValue allows you to set what string value in the CSV file should be interpreted as a nil value for // the purposes of insertion. // // Common values are: NULL, \N, NaN, NA NilValue *string // InferDataTypes can be set to true if the underlying data type should be automatically detected. // Using DictateDataType is the recommended approach (especially for large datasets or memory constrained systems). // DictateDataType always takes precedence when determining the type. // If the data type could not be detected, NewSeriesString is used. InferDataTypes bool // Headers must be set if the CSV file does not contain a header row. This must be nil if the CSV file contains a // header row. Headers []string }
CSVLoadOptions is likely to change.
type Converter ¶
type Converter struct { ConcreteType interface{} ConverterFunc GenericDataConverter }
Converter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("dataframe.SeriesGeneric"). As a special case, if ConcreteType is time.Time, then a SeriesTime is used.
Example:
opts := imports.CSVLoadOptions{ DictateDataType: map[string]interface{}{ "Date": imports.Converter{ ConcreteType: time.Time{}, ConverterFunc: func(in interface{}) (interface{}, error) { return time.Parse("2006-01-02", in.(string)) }, }, }, }
type Database ¶
type Database int
Database is used to set the Database. Different databases have different syntax for placeholders etc.
type GenericDataConverter ¶
type GenericDataConverter func(in interface{}) (interface{}, error)
GenericDataConverter is used to convert input data into a generic data type. This is required when importing data for a Generic Series ("SeriesGeneric").
type JSONLoadOptions ¶
type JSONLoadOptions struct { // LargeDataSet should be set to true for large datasets. // It will set the capacity of the underlying slices of the Dataframe by performing a basic parse // of the full dataset before processing the data fully. // Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case. LargeDataSet bool // DictateDataType is used to inform LoadFromJSON what the true underlying data type is for a given field name. // The key must be the case-sensitive field name. // The value for a given key must be of the data type of the data. // eg. For a string use "". For a int64 use int64(0). What is relevant is the data type and not the value itself. // // NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work. DictateDataType map[string]interface{} // ErrorOnUnknownFields will generate an error if an unknown field is encountered after the first row. ErrorOnUnknownFields bool }
JSONLoadOptions is likely to change.
type ParquetLoadOptions ¶
type ParquetLoadOptions struct { }
ParquetLoadOptions is likely to change.
type SQLLoadOptions ¶
type SQLLoadOptions struct { // KnownRowCount is used to set the capacity of the underlying slices of the Dataframe. // The maximum number of rows supported (on a 64-bit machine) is 9,223,372,036,854,775,807 (half of 64 bit range). // Preallocating memory can provide speed improvements. Benchmarks should be performed for your use-case. // // WARNING: Some databases may allow tables to contain more rows than the maximum supported. KnownRowCount *int // DictateDataType is used to inform LoadFromSQL what the true underlying data type is for a given column name. // The key must be the case-sensitive column name. // The value for a given key must be of the data type of the data. // eg. For a string use "". For a int64 use int64(0). What is relevant is the data type and not the value itself. // // NOTE: A custom Series must implement NewSerieser interface and be able to interpret strings to work. DictateDataType map[string]interface{} // Database is used to set the Database. Database Database // Query can be set to the sql stmt if a *sql.DB, *sql.TX, *sql.Conn or the equivalent from the mysql-go package is provided. // // See: https://godoc.org/github.com/rocketlaunchr/mysql-go Query string }
SQLLoadOptions is likely to change.