dataframe

package

v0.7.0 Latest Latest Go to latest Published: Nov 27, 2016 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/jueyanyingyu/gota

Links

Open Source Insights

Documentation ¶

Index ¶

func CfgColumnTypes(coltypes map[string]series.Type) func(*LoadOptions)
func CfgDefaultType(t series.Type) func(*LoadOptions)
func CfgDetectTypes(b bool) func(*LoadOptions)
func CfgHasHeader(b bool) func(*LoadOptions)
type DataFrame
type F
type LoadOptions
type SelectIndexes

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CfgColumnTypes ¶

func CfgColumnTypes(coltypes map[string]series.Type) func(*LoadOptions)

CfgColumnTypes set the types option for LoadOptions.

func CfgDefaultType ¶

func CfgDefaultType(t series.Type) func(*LoadOptions)

CfgDefaultType set the defaultType option for LoadOptions.

func CfgDetectTypes ¶

func CfgDetectTypes(b bool) func(*LoadOptions)

CfgDetectTypes set the detectTypes option for LoadOptions.

func CfgHasHeader ¶

func CfgHasHeader(b bool) func(*LoadOptions)

CfgHasHeader set the hasHeader option for LoadOptions.

Types ¶

type DataFrame ¶

type DataFrame struct {
	Err error
	// contains filtered or unexported fields
}

DataFrame is a data structure designed for operating on table like data (Such as Excel, CSV files, SQL table results...) where every column have to keep type integrity. As a general rule of thumb, variables are stored on columns where every row of a DataFrame represents an observation for each variable.

On the real world, data is very messy and sometimes there are non measurements or missing data. For this reason, DataFrame has support for NaN elements and allows the most common data cleaning and mungling operations such as subsetting, filtering, type transformations, etc. In addition to this, this library provides the necessary functions to concatenate DataFrames (By rows or columns), different Join operations (Inner, Outer, Left, Right, Cross) and the ability to read and write from different formats (CSV/JSON).

func LoadMaps ¶

func LoadMaps(maps []map[string]interface{}, options ...func(*LoadOptions)) DataFrame

LoadMaps creates a new DataFrame based on the given maps. This function assumes that every map on the array represents a row of observations.

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
)

func main() {
	df := dataframe.LoadMaps(
		[]map[string]interface{}{
			map[string]interface{}{
				"A": "a",
				"B": 1,
				"C": true,
				"D": 0,
			},
			map[string]interface{}{
				"A": "b",
				"B": 2,
				"C": true,
				"D": 0.5,
			},
		},
	)
	fmt.Println(df)
}

Output:

func LoadRecords ¶

func LoadRecords(records [][]string, options ...func(*LoadOptions)) DataFrame

LoadRecords creates a new DataFrame based on the given records.

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
	)
	fmt.Println(df)
}

Output:

Example (Options) ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
	"github.com/kniren/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
		dataframe.CfgDetectTypes(false),
		dataframe.CfgDefaultType(series.Float),
		dataframe.CfgColumnTypes(map[string]series.Type{
			"A": series.String,
			"D": series.Bool,
		}),
	)
	fmt.Println(df)
}

Output:

func New ¶

func New(se ...series.Series) DataFrame

New is the generic DataFrame constructor

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
	"github.com/kniren/gota/series"
)

func main() {
	df := dataframe.New(
		series.New([]string{"b", "a"}, series.String, "COL.1"),
		series.New([]int{1, 2}, series.Int, "COL.2"),
		series.New([]float64{3.0, 4.0}, series.Float, "COL.3"),
	)
	fmt.Println(df)
}

Output:

func ReadCSV ¶

func ReadCSV(r io.Reader, options ...func(*LoadOptions)) DataFrame

ReadCSV reads a CSV file from a io.Reader and builds a DataFrame with the resulting records.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/kniren/gota/dataframe"
)

func main() {
	csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
	df := dataframe.ReadCSV(strings.NewReader(csvStr))
	fmt.Println(df)
}

Output:

func ReadJSON ¶

func ReadJSON(r io.Reader, options ...func(*LoadOptions)) DataFrame

ReadJSON reads a JSON array from a io.Reader and builds a DataFrame with the resulting records.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/kniren/gota/dataframe"
)

func main() {
	jsonStr := `[{"COL.2":1,"COL.3":3},{"COL.1":5,"COL.2":2,"COL.3":2},{"COL.1":6,"COL.2":3,"COL.3":1}]`
	df := dataframe.ReadJSON(strings.NewReader(jsonStr))
	fmt.Println(df)
}

Output:

func (DataFrame) CBind ¶

func (df DataFrame) CBind(dfb DataFrame) DataFrame

CBind combines the columns of two DataFrames

func (DataFrame) Col ¶

func (df DataFrame) Col(colname string) series.Series

Col returns the Series with the given column name contained in the DataFrame.

func (DataFrame) Copy ¶

func (df DataFrame) Copy() DataFrame

Copy returns a copy of the DataFrame

func (DataFrame) CrossJoin ¶

func (df DataFrame) CrossJoin(b DataFrame) DataFrame

CrossJoin returns a DataFrame containing the cross join of two DataFrames.

func (DataFrame) Dim ¶

func (df DataFrame) Dim() (dim [2]int)

Dim retrieves the dimensions of a DataFrame.

func (DataFrame) Filter ¶

func (df DataFrame) Filter(filters ...F) DataFrame

Filter will filter the rows of a DataFrame based on the given filters. All filters on the argument of a Filter call are aggregated as an OR operation whereas if we chain Filter calls, every filter will act as an AND operation with regards to the rest.

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
	"github.com/kniren/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
	)
	fil := df.Filter(
		dataframe.F{"A", series.Eq, "a"},
		dataframe.F{"B", series.Greater, 4},
	)
	fil2 := fil.Filter(
		dataframe.F{"D", series.Eq, true},
	)
	fmt.Println(fil)
	fmt.Println(fil2)
}

Output:

func (DataFrame) InnerJoin ¶

func (df DataFrame) InnerJoin(b DataFrame, keys ...string) DataFrame

InnerJoin returns a DataFrame containing the inner join of two DataFrames.

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
	)
	df2 := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "F", "D"},
			[]string{"1", "1", "true"},
			[]string{"4", "2", "false"},
			[]string{"2", "8", "false"},
			[]string{"5", "9", "false"},
		},
	)
	// Change column C with a new one
	join := df.InnerJoin(df2, "D")
	fmt.Println(join)
}

Output:

func (DataFrame) LeftJoin ¶

func (df DataFrame) LeftJoin(b DataFrame, keys ...string) DataFrame

LeftJoin returns a DataFrame containing the left join of two DataFrames.

func (DataFrame) Maps ¶

func (df DataFrame) Maps() []map[string]interface{}

Maps return the array of maps representation of a DataFrame.

func (DataFrame) Mutate ¶

func (df DataFrame) Mutate(s series.Series) DataFrame

Mutate changes a column of the DataFrame with the given Series or adds it as a new column if the column name does not exist.

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
	"github.com/kniren/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
	)
	// Change column C with a new one
	mut := df.Mutate(
		series.New([]string{"a", "b", "c", "d"}, series.String, "C"),
	)
	// Add a new column E
	mut2 := df.Mutate(
		series.New([]string{"a", "b", "c", "d"}, series.String, "E"),
	)
	fmt.Println(mut)
	fmt.Println(mut2)
}

Output:

func (DataFrame) Names ¶

func (df DataFrame) Names() []string

Names returns the name of the columns on a DataFrame.

func (DataFrame) Ncol ¶

func (df DataFrame) Ncol() int

Ncol returns the number of columns on a DataFrame.

func (DataFrame) Nrow ¶

func (df DataFrame) Nrow() int

Nrow returns the number of rows on a DataFrame.

func (DataFrame) OuterJoin ¶

func (df DataFrame) OuterJoin(b DataFrame, keys ...string) DataFrame

OuterJoin returns a DataFrame containing the outer join of two DataFrames.

func (DataFrame) RBind ¶

func (df DataFrame) RBind(dfb DataFrame) DataFrame

RBind matches the column names of two DataFrames and returns the combination of the rows of both of them.

func (DataFrame) Records ¶

func (df DataFrame) Records() [][]string

Records return the string record representation of a DataFrame.

func (DataFrame) Rename ¶

func (df DataFrame) Rename(newname, oldname string) DataFrame

Rename changes the name of one of the columns of a DataFrame

func (DataFrame) RightJoin ¶

func (df DataFrame) RightJoin(b DataFrame, keys ...string) DataFrame

RightJoin returns a DataFrame containing the right join of two DataFrames.

func (DataFrame) Select ¶

func (df DataFrame) Select(indexes SelectIndexes) DataFrame

Select the given DataFrame columns

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
	)
	sel1 := df.Select([]int{0, 2})
	sel2 := df.Select([]string{"A", "C"})
	fmt.Println(sel1)
	fmt.Println(sel2)
}

Output:

func (DataFrame) Set ¶

func (df DataFrame) Set(indexes series.Indexes, newvalues DataFrame) DataFrame

Set will updated the values of a DataFrame for the rows selected via indexes.

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
	"github.com/kniren/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
	)
	df2 := df.Set(
		series.Ints([]int{0, 2}),
		dataframe.LoadRecords(
			[][]string{
				[]string{"A", "B", "C", "D"},
				[]string{"b", "4", "6.0", "true"},
				[]string{"c", "3", "6.0", "false"},
			},
		),
	)
	fmt.Println(df2)
}

Output:

func (DataFrame) SetNames ¶

func (df DataFrame) SetNames(colnames []string) error

SetNames changes the column names of a DataFrame to the ones passed as an argument.

func (DataFrame) String ¶

func (df DataFrame) String() (str string)

String implements the Stringer interface for DataFrame

func (DataFrame) Subset ¶

func (df DataFrame) Subset(indexes series.Indexes) DataFrame

Subset returns a subset of the rows of the original DataFrame based on the Series subsetting indexes.

Example ¶

package main

import (
	"fmt"

	"github.com/kniren/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			[]string{"A", "B", "C", "D"},
			[]string{"a", "4", "5.1", "true"},
			[]string{"k", "5", "7.0", "true"},
			[]string{"k", "4", "6.0", "true"},
			[]string{"a", "2", "7.1", "false"},
		},
	)
	sub := df.Subset([]int{0, 2})
	fmt.Println(sub)
}

Output:

func (DataFrame) Types ¶

func (df DataFrame) Types() []series.Type

Types returns the types of the columns on a DataFrame.

func (DataFrame) WriteCSV ¶

func (df DataFrame) WriteCSV(w io.Writer) error

WriteCSV writes the DataFrame to the given io.Writer as a CSV file.

func (DataFrame) WriteJSON ¶

func (df DataFrame) WriteJSON(w io.Writer) error

WriteJSON writes the DataFrame to the given io.Writer as a JSON array.

type F ¶

type F struct {
	Colname    string
	Comparator series.Comparator
	Comparando interface{}
}

F is the filtering structure

type LoadOptions ¶

type LoadOptions struct {
	// contains filtered or unexported fields
}

LoadOptions is the configuration that will be used for the loading operations

type SelectIndexes ¶

type SelectIndexes interface{}

SelectIndexes are the supported indexes used for the DataFrame.Select method. Currently supported are:

int              // Matches the given index number
[]int            // Matches all given index numbers
[]bool           // Matches all columns marked as true
string           // Matches the column with the matching column name
[]string         // Matches all columns with the matching column names
Series [Int]     // Same as []int
Series [Bool]    // Same as []bool
Series [String]  // Same as []string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL