Documentation ¶
Overview ¶
Package dataframe provides an implementation of data frames and methods to subset, join, mutate, set, arrange, summarize, etc.
Index ¶
- type CsvReader
- type CsvWriter
- type CustomTrimer
- type DataFrame
- func LoadMaps(maps []map[string]interface{}, options ...LoadOption) DataFrame
- func LoadMatrix(mat mat64.Matrix) DataFrame
- func LoadRecords(records [][]string, options ...LoadOption) DataFrame
- func LoadRecordsNoCopy(records [][]string, options ...LoadOption) DataFrame
- func New(se ...series.Series) DataFrame
- func NewNoCopy(se ...series.Series) DataFrame
- func ReadCSV(r io.Reader, options ...LoadOption) DataFrame
- func ReadJSON(r io.Reader, options ...LoadOption) DataFrame
- func (df DataFrame) Arrange(order ...Order) DataFrame
- func (df DataFrame) CBind(dfb DataFrame) DataFrame
- func (df DataFrame) CBindNoCopy(dfb DataFrame) DataFrame
- func (df DataFrame) Capply(f func(series.Series) series.Series) DataFrame
- func (df DataFrame) Col(colname string) series.Series
- func (df DataFrame) ColFilter(fn func(otherInfo interface{}) bool) series.Series
- func (df DataFrame) ColIndex(s string) int
- func (df DataFrame) ColNoCopy(colname string) series.Series
- func (df DataFrame) Copy() DataFrame
- func (df DataFrame) CrossJoin(b DataFrame) DataFrame
- func (df DataFrame) Dims() (r, c int)
- func (df DataFrame) Filter(filters ...F) DataFrame
- func (d DataFrame) Group(cols ...string) GroupedDataFrame
- func (df DataFrame) InnerJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) InnerJoinHash(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) LeftJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) LeftJoinHash(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) Levels(col string) map[string]int
- func (df DataFrame) Maps() []map[string]interface{}
- func (df DataFrame) Matrix() mat64.Matrix
- func (df DataFrame) Merge(b DataFrame, keys ...string) Merge
- func (df DataFrame) Mutate(s series.Series) DataFrame
- func (df DataFrame) MutateNoCopy(s series.Series) DataFrame
- func (df DataFrame) Names() []string
- func (df DataFrame) Ncol() int
- func (df DataFrame) Nrow() int
- func (df DataFrame) OuterJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) OuterJoinHash(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) RBind(dfb DataFrame) DataFrame
- func (df DataFrame) RBindNoCopy(dfb DataFrame) DataFrame
- func (df DataFrame) Rapply(f func(series.Series) series.Series) DataFrame
- func (df DataFrame) RapplySeries(name string, seriesType series.Type, f func(series.Series) interface{}) series.Series
- func (df DataFrame) Records() [][]string
- func (df DataFrame) Rename(newname, oldname string) DataFrame
- func (df DataFrame) RenameNoCopy(newname, oldname string) DataFrame
- func (df DataFrame) Reshape(id, col, val string, defaultValue interface{}) DataFrame
- func (df DataFrame) RightJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) RightJoinHash(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) Select(indexes SelectIndexes) DataFrame
- func (df DataFrame) Set(indexes series.Indexes, newvalues DataFrame) DataFrame
- func (df DataFrame) SetNames(colnames []string) error
- func (df DataFrame) String() (str string)
- func (df DataFrame) Subset(indexes series.Indexes) DataFrame
- func (df DataFrame) SubsetNoColumnCopy(indexes series.Indexes) DataFrame
- func (df DataFrame) Summarize(colname string) func(...func(series.Series) (series.Element, error)) DataFrame
- func (df DataFrame) Types() []series.Type
- func (df DataFrame) WriteCSV(w io.Writer) error
- func (df DataFrame) WriteJSON(w io.Writer) error
- func (df DataFrame) WriteTo(dw Writer) error
- type F
- type GroupedDataFrame
- type JSONReader
- type JSONWriter
- type LoadOption
- func DefaultType(t series.Type) LoadOption
- func DetectTypeThreshold(rate float64) LoadOption
- func DetectTypes(b bool) LoadOption
- func HasHeader(b bool) LoadOption
- func NaNValues(nanValues []string) LoadOption
- func OnCustomTrimer(fn CustomTrimer) LoadOption
- func TrimHeaderString(fn func(r rune) bool) LoadOption
- func WithDatetimeFormat(datetimeFormat string) LoadOption
- func WithTypes(coltypes map[string]series.Type) LoadOption
- type Merge
- func (m Merge) InnerJoin() DataFrame
- func (m Merge) LeftJoin() DataFrame
- func (m Merge) OuterJoin() DataFrame
- func (m Merge) RightJoin() DataFrame
- func (m Merge) WithCombine(fn func(aSerie, bSerie series.Series) bool) Merge
- func (m Merge) WithResultHeader(fn func(a, b series.Series) (string, interface{}, bool)) Merge
- type Order
- type Reader
- type SelectIndexes
- type Writer
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CsvWriter ¶ added in v0.8.2
type CsvWriter struct {
// contains filtered or unexported fields
}
CsvWriter CSV Writer
func NewCsvWriter ¶ added in v0.8.2
NewCsvWriter creates new instance of CsvWriter
type CustomTrimer ¶ added in v0.8.2
CustomTrimFn custom raw data trimer
type DataFrame ¶
type DataFrame struct { Err error // contains filtered or unexported fields }
DataFrame is a data structure designed for operating on table like data (Such as Excel, CSV files, SQL table results...) where every column have to keep type integrity. As a general rule of thumb, variables are stored on columns where every row of a DataFrame represents an observation for each variable.
On the real world, data is very messy and sometimes there are non measurements or missing data. For this reason, DataFrame has support for NaN elements and allows the most common data cleaning and mungling operations such as subsetting, filtering, type transformations, etc. In addition to this, this library provides the necessary functions to concatenate DataFrames (By rows or columns), different Join operations (Inner, Outer, Left, Right, Cross) and the ability to read and write from different formats (CSV/JSON).
func LoadMaps ¶
func LoadMaps(maps []map[string]interface{}, options ...LoadOption) DataFrame
LoadMaps creates a new DataFrame based on the given maps. This function assumes that every map on the array represents a row of observations.
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" ) func main() { df := dataframe.LoadMaps( []map[string]interface{}{ { "A": "a", "B": 1, "C": true, "D": 0, }, { "A": "b", "B": 2, "C": true, "D": 0.5, }, }, ) fmt.Println(df) }
Output:
func LoadMatrix ¶ added in v0.8.0
LoadMatrix loads the given mat64.Matrix as a DataFrame
func LoadRecords ¶
func LoadRecords(records [][]string, options ...LoadOption) DataFrame
LoadRecords creates a new DataFrame based on the given records.
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"k", "5", "7.0", "true"}, {"k", "4", "6.0", "true"}, {"a", "2", "7.1", "false"}, }, ) fmt.Println(df) }
Output:
Example (Options) ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" "github.com/kniren/gota/series" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"k", "5", "7.0", "true"}, {"k", "4", "6.0", "true"}, {"a", "2", "7.1", "false"}, }, dataframe.DetectTypes(false), dataframe.DefaultType(series.Float), dataframe.WithTypes(map[string]series.Type{ "A": series.String, "D": series.Bool, }), ) fmt.Println(df) }
Output:
func LoadRecordsNoCopy ¶ added in v0.8.2
func LoadRecordsNoCopy(records [][]string, options ...LoadOption) DataFrame
LoadRecordsNoCopy creates a new DataFrame based on the given records with zero copy, records must be already packed in columns
func New ¶
New is the generic DataFrame constructor
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" "github.com/kniren/gota/series" ) func main() { df := dataframe.New( series.New([]string{"b", "a"}, series.String, "COL.1"), series.New([]int{1, 2}, series.Int, "COL.2"), series.New([]float64{3.0, 4.0}, series.Float, "COL.3"), ) fmt.Println(df) }
Output:
func ReadCSV ¶
func ReadCSV(r io.Reader, options ...LoadOption) DataFrame
ReadCSV reads a CSV file from a io.Reader and builds a DataFrame with the resulting records.
Example ¶
package main import ( "fmt" "strings" "github.com/kniren/gota/dataframe" ) func main() { csvStr := ` Country,Date,Age,Amount,Id "United States",2012-02-01,50,112.1,01234 "United States",2012-02-01,32,321.31,54320 "United Kingdom",2012-02-01,17,18.2,12345 "United States",2012-02-01,32,321.31,54320 "United Kingdom",2012-02-01,NA,18.2,12345 "United States",2012-02-01,32,321.31,54320 "United States",2012-02-01,32,321.31,54320 Spain,2012-02-01,66,555.42,00241 ` df := dataframe.ReadCSV(strings.NewReader(csvStr)) fmt.Println(df) }
Output:
func ReadJSON ¶
func ReadJSON(r io.Reader, options ...LoadOption) DataFrame
ReadJSON reads a JSON array from a io.Reader and builds a DataFrame with the resulting records.
Example ¶
package main import ( "fmt" "strings" "github.com/kniren/gota/dataframe" ) func main() { jsonStr := `[{"COL.2":1,"COL.3":3},{"COL.1":5,"COL.2":2,"COL.3":2},{"COL.1":6,"COL.2":3,"COL.3":1}]` df := dataframe.ReadJSON(strings.NewReader(jsonStr)) fmt.Println(df) }
Output:
func (DataFrame) Arrange ¶ added in v0.8.0
Arrange sort the rows of a DataFrame according to the given Order
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"b", "4", "6.0", "true"}, {"c", "3", "6.0", "false"}, {"a", "2", "7.1", "false"}, }, ) sorted := df.Arrange( dataframe.Sort("A"), dataframe.RevSort("B"), ) fmt.Println(sorted) }
Output:
func (DataFrame) CBindNoCopy ¶ added in v0.8.4
CBindNoCopy combines the columns of two DataFrames without copy
func (DataFrame) Capply ¶ added in v0.8.0
Capply applies the given function to the columns of a DataFrame
func (DataFrame) Col ¶
Col returns the Series with the given column name contained in the DataFrame.
func (DataFrame) ColIndex ¶ added in v0.8.2
ColIndex returns the index of the column with name `s`. If it fails to find the column it returns -1 instead.
func (DataFrame) CrossJoin ¶
CrossJoin returns a DataFrame containing the cross join of two DataFrames.
func (DataFrame) Filter ¶
Filter will filter the rows of a DataFrame based on the given filters. All filters on the argument of a Filter call are aggregated as an OR operation whereas if we chain Filter calls, every filter will act as an AND operation with regards to the rest.
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" "github.com/kniren/gota/series" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"k", "5", "7.0", "true"}, {"k", "4", "6.0", "true"}, {"a", "2", "7.1", "false"}, }, ) fil := df.Filter( dataframe.F{ Colname: "A", Comparator: series.Eq, Comparando: "a", }, dataframe.F{ Colname: "B", Comparator: series.Greater, Comparando: 4, }, ) fil2 := fil.Filter( dataframe.F{ Colname: "D", Comparator: series.Eq, Comparando: true, }, ) fmt.Println(fil) fmt.Println(fil2) }
Output:
func (DataFrame) Group ¶ added in v0.8.2
func (d DataFrame) Group(cols ...string) GroupedDataFrame
Group create a GroupedDataFrame with cols groups
func (DataFrame) InnerJoin ¶
InnerJoin returns a DataFrame containing the inner join of two DataFrames.
func (DataFrame) InnerJoinHash ¶ added in v0.8.2
InnerJoinHash returns a DataFrame containing the inner join of two DataFrames.
func (DataFrame) LeftJoin ¶
LeftJoin returns a DataFrame containing the left join of two DataFrames.
func (DataFrame) LeftJoinHash ¶ added in v0.8.2
LeftJoinHash returns a DataFrame containing the left outer join of two DataFrames.
func (DataFrame) Matrix ¶ added in v0.8.0
Matrix returns the mat64.Matrix representation of a DataFrame
func (DataFrame) Merge ¶ added in v0.8.2
Merge returns a Merge struct for containing ifo about merge
func (DataFrame) Mutate ¶
Mutate changes a column of the DataFrame with the given Series or adds it as a new column if the column name does not exist.
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" "github.com/kniren/gota/series" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"k", "5", "7.0", "true"}, {"k", "4", "6.0", "true"}, {"a", "2", "7.1", "false"}, }, ) // Change column C with a new one mut := df.Mutate( series.New([]string{"a", "b", "c", "d"}, series.String, "C"), ) // Add a new column E mut2 := df.Mutate( series.New([]string{"a", "b", "c", "d"}, series.String, "E"), ) fmt.Println(mut) fmt.Println(mut2) }
Output:
func (DataFrame) MutateNoCopy ¶ added in v0.8.4
MutateNoCopy changes a column of the DataFrame with the given Series or adds it as a new column if the column name does not exist. Doesn't perform copy of memory.
func (DataFrame) OuterJoin ¶
OuterJoin returns a DataFrame containing the outer join of two DataFrames.
func (DataFrame) OuterJoinHash ¶ added in v0.8.2
OuterJoinHash returns a DataFrame containing the outer join of two DataFrames.
func (DataFrame) RBind ¶
RBind matches the column names of two DataFrames and returns the combination of the rows of both of them.
func (DataFrame) RBindNoCopy ¶ added in v0.8.4
RBindNoCopy matches the column names of two DataFrames and returns the combination of the rows of both of them. Doesn't copy series
func (DataFrame) Rapply ¶ added in v0.8.0
Rapply applies the given function to the rows of a DataFrame. Prior to applying the function the elements of each row are casted to a Series of a specific type. In order of priority: String -> Time -> Float -> Int -> Bool. This casting also takes place after the function application to equalize the type of the columns.
func (DataFrame) RapplySeries ¶ added in v0.8.2
func (DataFrame) RenameNoCopy ¶ added in v0.8.4
RenameNoCopy changes the name of one of the columns of a DataFrame without copy
func (DataFrame) RightJoin ¶
RightJoin returns a DataFrame containing the right join of two DataFrames.
func (DataFrame) RightJoinHash ¶ added in v0.8.2
RightJoinHash returns a DataFrame containing the left outer join of two DataFrames.
func (DataFrame) Select ¶
func (df DataFrame) Select(indexes SelectIndexes) DataFrame
Select the given DataFrame columns
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"k", "5", "7.0", "true"}, {"k", "4", "6.0", "true"}, {"a", "2", "7.1", "false"}, }, ) sel1 := df.Select([]int{0, 2}) sel2 := df.Select([]string{"A", "C"}) fmt.Println(sel1) fmt.Println(sel2) }
Output:
func (DataFrame) Set ¶
Set will updated the values of a DataFrame for the rows selected via indexes.
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" "github.com/kniren/gota/series" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"k", "5", "7.0", "true"}, {"k", "4", "6.0", "true"}, {"a", "2", "7.1", "false"}, }, ) df2 := df.Set( series.Ints([]int{0, 2}), dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"b", "4", "6.0", "true"}, {"c", "3", "6.0", "false"}, }, ), ) fmt.Println(df2) }
Output:
func (DataFrame) SetNames ¶
SetNames changes the column names of a DataFrame to the ones passed as an argument.
func (DataFrame) Subset ¶
Subset returns a subset of the rows of the original DataFrame based on the Series subsetting indexes.
Example ¶
package main import ( "fmt" "github.com/kniren/gota/dataframe" ) func main() { df := dataframe.LoadRecords( [][]string{ {"A", "B", "C", "D"}, {"a", "4", "5.1", "true"}, {"k", "5", "7.0", "true"}, {"k", "4", "6.0", "true"}, {"a", "2", "7.1", "false"}, }, ) sub := df.Subset([]int{0, 2}) fmt.Println(sub) }
Output:
func (DataFrame) SubsetNoColumnCopy ¶ added in v0.8.2
func (DataFrame) Summarize ¶ added in v0.8.2
func (df DataFrame) Summarize(colname string) func(...func(series.Series) (series.Element, error)) DataFrame
Summarize runs a series of functions on a column and returns result in new DataFrame
type F ¶
type F struct { Colname string Comparator series.Comparator Comparando interface{} }
F is the filtering structure
type GroupedDataFrame ¶ added in v0.8.2
type GroupedDataFrame struct { DataFrame // contains filtered or unexported fields }
GroupedDataFrame a DataFrame which is grouped by columns
func (GroupedDataFrame) Summarize ¶ added in v0.8.2
func (g GroupedDataFrame) Summarize(f func(DataFrame) series.Series) DataFrame
func (GroupedDataFrame) SummarizeAsync ¶ added in v0.8.2
func (g GroupedDataFrame) SummarizeAsync(f func(DataFrame) series.Series) DataFrame
type JSONReader ¶ added in v0.8.2
type JSONReader struct { }
JSONReader read a JSON to DataFrame
func (JSONReader) Read ¶ added in v0.8.2
func (jr JSONReader) Read(r io.Reader, options ...LoadOption) DataFrame
type JSONWriter ¶ added in v0.8.2
type JSONWriter struct {
// contains filtered or unexported fields
}
JSONWriter JSON Writer definition
func MakeJSONWriter ¶ added in v0.8.2
func MakeJSONWriter(w io.Writer) JSONWriter
MakeJSONWriter creates a new instance of JSONWriter
func (*JSONWriter) Write ¶ added in v0.8.2
func (w *JSONWriter) Write(df DataFrame) error
type LoadOption ¶ added in v0.8.0
type LoadOption func(*loadOptions)
LoadOption is the type used to configure the load of elements
func DefaultType ¶ added in v0.8.0
func DefaultType(t series.Type) LoadOption
DefaultType set the defaultType option for loadOptions.
func DetectTypeThreshold ¶ added in v0.8.2
func DetectTypeThreshold(rate float64) LoadOption
DetectTypeThreshold set the detectTypeThreshold option for loadOptions. value [0..100]
func DetectTypes ¶ added in v0.8.0
func DetectTypes(b bool) LoadOption
DetectTypes set the detectTypes option for loadOptions.
func HasHeader ¶ added in v0.8.0
func HasHeader(b bool) LoadOption
HasHeader set the hasHeader option for loadOptions.
func NaNValues ¶ added in v0.8.0
func NaNValues(nanValues []string) LoadOption
NaNValues set which values are to be parsed as NaN
func OnCustomTrimer ¶ added in v0.8.2
func OnCustomTrimer(fn CustomTrimer) LoadOption
OnCustomParser set callback for parsing
func TrimHeaderString ¶ added in v0.8.2
func TrimHeaderString(fn func(r rune) bool) LoadOption
TrimHeaderString trim header string which satisfy f(c) function
func WithDatetimeFormat ¶ added in v0.8.5
func WithDatetimeFormat(datetimeFormat string) LoadOption
type Merge ¶ added in v0.8.2
type Merge struct {
// contains filtered or unexported fields
}
Merge struct definition
func (Merge) WithCombine ¶ added in v0.8.2
WithCombine specify to merge same columns into one
type Order ¶ added in v0.8.0
Order is the ordering structure
type Reader ¶ added in v0.8.2
type Reader interface {
Read(r io.Reader, options ...LoadOption) DataFrame
}
Reader interface for reading from a io.Reader
type SelectIndexes ¶
type SelectIndexes interface{}
SelectIndexes are the supported indexes used for the DataFrame.Select method. Currently supported are:
int // Matches the given index number []int // Matches all given index numbers []bool // Matches all columns marked as true string // Matches the column with the matching column name []string // Matches all columns with the matching column names Series [Int] // Same as []int Series [Bool] // Same as []bool Series [String] // Same as []string