dataframe

package

v1.0.14 Latest Latest Go to latest Published: Sep 3, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dreamsxin/gota

Links

Open Source Insights

Documentation ¶

Overview ¶

Package dataframe provides an implementation of data frames and methods to subset, join, mutate, set, arrange, summarize, etc.

Index ¶

Constants
type Aggregation
- func (a Aggregation) String() string
type AggregationType
- func (i AggregationType) String() string
type DataFrame
- func LoadMaps(maps []map[string]interface{}, options ...LoadOption) DataFrame
- func LoadMatrix(mat Matrix) DataFrame
- func LoadRecords(records [][]string, options ...LoadOption) DataFrame
- func LoadStructs(i interface{}, options ...LoadOption) DataFrame
- func New(se ...series.Series) DataFrame
- func ReadCSV(r io.Reader, options ...LoadOption) DataFrame
- func ReadHTML(r io.Reader, options ...LoadOption) []DataFrame
- func ReadJSON(r io.Reader, options ...LoadOption) DataFrame
- func (df DataFrame) Arrange(order ...Order) DataFrame
- func (df DataFrame) At(i, j int) float64
- func (df DataFrame) CBind(dfb DataFrame) DataFrame
- func (df DataFrame) Capply(f func(series.Series) series.Series) DataFrame
- func (df DataFrame) Col(colname string) series.Series
- func (df DataFrame) ColIndex(s string) int
- func (df DataFrame) Concat(dfb DataFrame) DataFrame
- func (df DataFrame) Copy() DataFrame
- func (df DataFrame) CrossJoin(b DataFrame) DataFrame
- func (df DataFrame) Describe() DataFrame
- func (df DataFrame) Dims() (int, int)
- func (df DataFrame) Drop(indexes SelectIndexes) DataFrame
- func (df DataFrame) Elem(r, c int) series.Element
- func (df *DataFrame) Error() error
- func (df DataFrame) FillNaN(colname string, value series.Series) DataFrame
- func (df DataFrame) Filter(filters ...F) DataFrame
- func (df DataFrame) FilterAggregation(agg Aggregation, filters ...F) DataFrame
- func (df DataFrame) GetRow(r int, funcs ...func(series.Type, interface{}) interface{}) (m map[string]interface{})
- func (df DataFrame) GroupBy(colnames ...string) *Groups
- func (df DataFrame) InnerJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) LeftJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) Maps(funcs ...func(series.Type, interface{}) interface{}) []map[string]interface{}
- func (df DataFrame) Mutate(ss ...series.Series) DataFrame
- func (df DataFrame) Names() []string
- func (df DataFrame) Ncol() int
- func (df DataFrame) Nrow() int
- func (df DataFrame) OuterJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) Pivot(rows []string, columns []string, values []PivotValue) DataFrame
- func (df DataFrame) Print(shortRows, shortCols, showDims, showTypes bool) (str string)
- func (df DataFrame) RBind(dfb DataFrame) DataFrame
- func (df DataFrame) Rapply(f func(series.Series) series.Series) DataFrame
- func (df DataFrame) Records() [][]string
- func (df DataFrame) Rename(newname, oldname string) DataFrame
- func (df DataFrame) RightJoin(b DataFrame, keys ...string) DataFrame
- func (df DataFrame) Select(indexes SelectIndexes) DataFrame
- func (df DataFrame) Set(indexes series.Indexes, newvalues DataFrame) DataFrame
- func (df DataFrame) SetNames(colnames ...string) error
- func (df DataFrame) Show() error
- func (df DataFrame) SliceRow(start, end int) DataFrame
- func (df DataFrame) String() (str string)
- func (df DataFrame) Subset(indexes series.Indexes) DataFrame
- func (df DataFrame) T() mat.Matrix
- func (df DataFrame) Types() []series.Type
- func (df DataFrame) WriteCSV(w io.Writer, options ...WriteOption) error
- func (df DataFrame) WriteJSON(w io.Writer) error
type F
type Groups
- func (gps Groups) Agg(typ AggregationType, colnames []string) DataFrame
- func (gps Groups) Aggregation(typs []AggregationType, colnames []string) DataFrame
- func (g Groups) GetGroups() map[string]DataFrame
type LoadOption
- func DefaultType(t series.Type) LoadOption
- func DetectTypes(b bool) LoadOption
- func HasHeader(b bool) LoadOption
- func NaNValues(nanValues []string) LoadOption
- func Names(names ...string) LoadOption
- func WithComments(b rune) LoadOption
- func WithDelimiter(b rune) LoadOption
- func WithLazyQuotes(b bool) LoadOption
- func WithSkipColIdxs(m map[int]int) LoadOption
- func WithSkipColNames(m map[string]string) LoadOption
- func WithTypes(coltypes map[string]series.Type) LoadOption
type Matrix
type Order
- func RevSort(colname string) Order
- func Sort(colname string) Order
type PivotValue
type SelectIndexes
type WriteOption
- func WriteHeader(b bool) WriteOption

Constants ¶

View Source

const KEY_ERROR = "KEY_ERROR"

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Aggregation ¶

type Aggregation int

Aggregation defines the filter aggregation

const (
	// Or aggregates filters with logical or
	Or Aggregation = iota
	// And aggregates filters with logical and
	And
)

func (Aggregation) String ¶

func (a Aggregation) String() string

type AggregationType ¶

type AggregationType int

AggregationType Aggregation method type

const (
	Aggregation_MAX    AggregationType = iota + 1 // MAX
	Aggregation_MIN                               // MIN
	Aggregation_MEAN                              // MEAN
	Aggregation_MEDIAN                            // MEDIAN
	Aggregation_STD                               // STD
	Aggregation_SUM                               // SUM
	Aggregation_COUNT                             // COUNT
)

func (AggregationType) String ¶

func (i AggregationType) String() string

type DataFrame ¶

type DataFrame struct {

	// deprecated: Use Error() instead
	Err error
	// contains filtered or unexported fields
}

DataFrame is a data structure designed for operating on table like data (Such as Excel, CSV files, SQL table results...) where every column have to keep type integrity. As a general rule of thumb, variables are stored on columns where every row of a DataFrame represents an observation for each variable.

On the real world, data is very messy and sometimes there are non measurements or missing data. For this reason, DataFrame has support for NaN elements and allows the most common data cleaning and mungling operations such as subsetting, filtering, type transformations, etc. In addition to this, this library provides the necessary functions to concatenate DataFrames (By rows or columns), different Join operations (Inner, Outer, Left, Right, Cross) and the ability to read and write from different formats (CSV/JSON).

func LoadMaps ¶

func LoadMaps(maps []map[string]interface{}, options ...LoadOption) DataFrame

LoadMaps creates a new DataFrame based on the given maps. This function assumes that every map on the array represents a row of observations.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	df := dataframe.LoadMaps(
		[]map[string]interface{}{
			{
				"A": "a",
				"B": 1,
				"C": true,
				"D": 0,
			},
			{
				"A": "b",
				"B": 2,
				"C": true,
				"D": 0.5,
			},
		},
	)
	fmt.Println(df)

	// Otput:
	// [2x4] DataFrame
	//
	//     A        B     C      D
	//  0: a        1     true   0.000000
	//  1: b        2     true   0.500000
	//     <string> <int> <bool> <float>

}

Output:

func LoadMatrix ¶

func LoadMatrix(mat Matrix) DataFrame

LoadMatrix loads the given Matrix as a DataFrame TODO: Add Loadoptions

func LoadRecords ¶

func LoadRecords(records [][]string, options ...LoadOption) DataFrame

LoadRecords creates a new DataFrame based on the given records.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
	)
	fmt.Println(df)

}

Output:

[4x4] DataFrame

    A        B     C        D
 0: a        4     5.100000 true
 1: k        5     7.000000 true
 2: k        4     6.000000 true
 3: a        2     7.100000 false
    <string> <int> <float>  <bool>

Example (Options) ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
	"github.com/dreamsxin/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
		dataframe.DetectTypes(false),
		dataframe.DefaultType(series.Float),
		dataframe.WithTypes(map[string]series.Type{
			"A": series.String,
			"D": series.Bool,
		}),
	)
	fmt.Println(df)

}

Output:

[4x4] DataFrame

    A        B        C        D
 0: a        4.000000 5.100000 true
 1: k        5.000000 7.000000 true
 2: k        4.000000 6.000000 true
 3: a        2.000000 7.100000 false
    <string> <float>  <float>  <bool>

func LoadStructs ¶

func LoadStructs(i interface{}, options ...LoadOption) DataFrame

LoadStructs creates a new DataFrame from arbitrary struct slices.

LoadStructs will ignore unexported fields inside an struct. Note also that unless otherwise specified the column names will correspond with the name of the field.

You can configure each field with the `dataframe:"name[,type]"` struct tag. If the name on the tag is the empty string `""` the field name will be used instead. If the name is `"-"` the field will be ignored.

Examples:

// field will be ignored
field int

// Field will be ignored
Field int `dataframe:"-"`

// Field will be parsed with column name Field and type int
Field int

// Field will be parsed with column name `field_column` and type int.
Field int `dataframe:"field_column"`

// Field will be parsed with column name `field` and type string.
Field int `dataframe:"field,string"`

// Field will be parsed with column name `Field` and type string.
Field int `dataframe:",string"`

If the struct tags and the given LoadOptions contradict each other, the later will have preference over the former.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	type User struct {
		Name     string
		Age      int
		Accuracy float64
	}
	users := []User{
		{"Aram", 17, 0.2},
		{"Juan", 18, 0.8},
		{"Ana", 22, 0.5},
	}
	df := dataframe.LoadStructs(users)
	fmt.Println(df)

}

Output:

[3x3] DataFrame

    Name     Age   Accuracy
 0: Aram     17    0.200000
 1: Juan     18    0.800000
 2: Ana      22    0.500000
    <string> <int> <float>

func New ¶

func New(se ...series.Series) DataFrame

New is the generic DataFrame constructor

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
	"github.com/dreamsxin/gota/series"
)

func main() {
	df := dataframe.New(
		series.New([]string{"b", "a"}, series.String, "COL.1"),
		series.New([]int{1, 2}, series.Int, "COL.2"),
		series.New([]float64{3.0, 4.0}, series.Float, "COL.3"),
	)
	fmt.Println(df)

}

Output:

[2x3] DataFrame

    COL.1    COL.2 COL.3
 0: b        1     3.000000
 1: a        2     4.000000
    <string> <int> <float>

func ReadCSV ¶

func ReadCSV(r io.Reader, options ...LoadOption) DataFrame

ReadCSV reads a CSV file from a io.Reader and builds a DataFrame with the resulting records.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
	df := dataframe.ReadCSV(strings.NewReader(csvStr))
	fmt.Println(df)

}

Output:

[8x5] DataFrame

    Country        Date       Age   Amount     Id
 0: United States  2012-02-01 50    112.100000 1234
 1: United States  2012-02-01 32    321.310000 54320
 2: United Kingdom 2012-02-01 17    18.200000  12345
 3: United States  2012-02-01 32    321.310000 54320
 4: United Kingdom 2012-02-01 NaN   18.200000  12345
 5: United States  2012-02-01 32    321.310000 54320
 6: United States  2012-02-01 32    321.310000 54320
 7: Spain          2012-02-01 66    555.420000 241
    <string>       <string>   <int> <float>    <int>

func ReadHTML ¶

func ReadHTML(r io.Reader, options ...LoadOption) []DataFrame

func ReadJSON ¶

func ReadJSON(r io.Reader, options ...LoadOption) DataFrame

ReadJSON reads a JSON array from a io.Reader and builds a DataFrame with the resulting records.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	jsonStr := `[{"COL.2":1,"COL.3":3},{"COL.1":5,"COL.2":2,"COL.3":2},{"COL.1":6,"COL.2":3,"COL.3":1}]`
	df := dataframe.ReadJSON(strings.NewReader(jsonStr))
	fmt.Println(df)

}

Output:

[3x3] DataFrame

    COL.1 COL.2 COL.3
 0: NaN   1     3
 1: 5     2     2
 2: 6     3     1
    <int> <int> <int>

func (DataFrame) Arrange ¶

func (df DataFrame) Arrange(order ...Order) DataFrame

Arrange sort the rows of a DataFrame according to the given Order

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"b", "4", "6.0", "true"},
			{"c", "3", "6.0", "false"},
			{"a", "2", "7.1", "false"},
		},
	)
	sorted := df.Arrange(
		dataframe.Sort("A"),
		dataframe.RevSort("B"),
	)
	fmt.Println(sorted)

}

Output:

[4x4] DataFrame

    A        B     C        D
 0: a        4     5.100000 true
 1: a        2     7.100000 false
 2: b        4     6.000000 true
 3: c        3     6.000000 false
    <string> <int> <float>  <bool>

func (DataFrame) At ¶ added in v1.0.10

func (df DataFrame) At(i, j int) float64

func (DataFrame) CBind ¶

func (df DataFrame) CBind(dfb DataFrame) DataFrame

CBind combines the columns of this DataFrame and dfb DataFrame.

func (DataFrame) Capply ¶

func (df DataFrame) Capply(f func(series.Series) series.Series) DataFrame

Capply applies the given function to the columns of a DataFrame

func (DataFrame) Col ¶

func (df DataFrame) Col(colname string) series.Series

Col returns a copy of the Series with the given column name contained in the DataFrame.

func (DataFrame) ColIndex ¶ added in v1.0.14

func (df DataFrame) ColIndex(s string) int

ColIndex returns the index of the column with name `s`. If it fails to find the column it returns -1 instead.

func (DataFrame) Concat ¶

func (df DataFrame) Concat(dfb DataFrame) DataFrame

Concat concatenates rows of two DataFrames like RBind, but also including unmatched columns.

func (DataFrame) Copy ¶

func (df DataFrame) Copy() DataFrame

Copy returns a copy of the DataFrame

func (DataFrame) CrossJoin ¶

func (df DataFrame) CrossJoin(b DataFrame) DataFrame

CrossJoin returns a DataFrame containing the cross join of two DataFrames.

func (DataFrame) Describe ¶

func (df DataFrame) Describe() DataFrame

Describe prints the summary statistics for each column of the dataframe

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"b", "4", "6.0", "true"},
			{"c", "3", "6.0", "false"},
			{"a", "2", "7.1", "false"},
		},
	)
	fmt.Println(df.Describe())

}

Output:

[8x5] DataFrame

    column   A        B        C        D
 0: mean     -        3.250000 6.050000 0.500000
 1: median   -        3.500000 6.000000 NaN
 2: stddev   -        0.957427 0.818535 0.577350
 3: min      a        2.000000 5.100000 0.000000
 4: 25%      -        2.000000 5.100000 0.000000
 5: 50%      -        3.000000 6.000000 0.000000
 6: 75%      -        4.000000 6.000000 1.000000
 7: max      c        4.000000 7.100000 1.000000
    <string> <string> <float>  <float>  <float>

func (DataFrame) Dims ¶

func (df DataFrame) Dims() (int, int)

Dims retrieves the dimensions of a DataFrame.

func (DataFrame) Drop ¶

func (df DataFrame) Drop(indexes SelectIndexes) DataFrame

Drop the given DataFrame columns

func (DataFrame) Elem ¶

func (df DataFrame) Elem(r, c int) series.Element

Elem returns the element on row `r` and column `c`. Will panic if the index is out of bounds.

func (*DataFrame) Error ¶

func (df *DataFrame) Error() error

Returns error or nil if no error occured

func (DataFrame) FillNaN ¶ added in v1.0.9

func (df DataFrame) FillNaN(colname string, value series.Series) DataFrame

func (DataFrame) Filter ¶

func (df DataFrame) Filter(filters ...F) DataFrame

Filter will filter the rows of a DataFrame based on the given filters. All filters on the argument of a Filter call are aggregated as an OR operation whereas if we chain Filter calls, every filter will act as an AND operation with regards to the rest.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
	"github.com/dreamsxin/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
	)
	fil := df.Filter(
		dataframe.F{
			Colname:    "A",
			Comparator: series.Eq,
			Comparando: "a",
		},
		dataframe.F{
			Colname:    "B",
			Comparator: series.Greater,
			Comparando: 4,
		},
	)
	fil2 := fil.Filter(
		dataframe.F{
			Colname:    "D",
			Comparator: series.Eq,
			Comparando: true,
		},
	)
	fmt.Println(fil)
	fmt.Println(fil2)

}

Output:

[3x4] DataFrame

    A        B     C        D
 0: a        4     5.100000 true
 1: k        5     7.000000 true
 2: a        2     7.100000 false
    <string> <int> <float>  <bool>

[2x4] DataFrame

    A        B     C        D
 0: a        4     5.100000 true
 1: k        5     7.000000 true
    <string> <int> <float>  <bool>

func (DataFrame) FilterAggregation ¶

func (df DataFrame) FilterAggregation(agg Aggregation, filters ...F) DataFrame

FilterAggregation will filter the rows of a DataFrame based on the given filters. All filters on the argument of a Filter call are aggregated depending on the supplied aggregation.

func (DataFrame) GetRow ¶ added in v1.0.14

func (df DataFrame) GetRow(r int, funcs ...func(series.Type, interface{}) interface{}) (m map[string]interface{})

func (DataFrame) GroupBy ¶

func (df DataFrame) GroupBy(colnames ...string) *Groups

GroupBy Group dataframe by columns

func (DataFrame) InnerJoin ¶

func (df DataFrame) InnerJoin(b DataFrame, keys ...string) DataFrame

InnerJoin returns a DataFrame containing the inner join of two DataFrames.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
	)
	df2 := dataframe.LoadRecords(
		[][]string{
			{"A", "F", "D"},
			{"1", "1", "true"},
			{"4", "2", "false"},
			{"2", "8", "false"},
			{"5", "9", "false"},
		},
	)
	join := df.InnerJoin(df2, "D")
	fmt.Println(join)

}

Output:

[6x6] DataFrame

    D      A_0      B     C        A_1   F
 0: true   a        4     5.100000 1     1
 1: true   k        5     7.000000 1     1
 2: true   k        4     6.000000 1     1
 3: false  a        2     7.100000 4     2
 4: false  a        2     7.100000 2     8
 5: false  a        2     7.100000 5     9
    <bool> <string> <int> <float>  <int> <int>

func (DataFrame) LeftJoin ¶

func (df DataFrame) LeftJoin(b DataFrame, keys ...string) DataFrame

LeftJoin returns a DataFrame containing the left join of two DataFrames.

func (DataFrame) Maps ¶

func (df DataFrame) Maps(funcs ...func(series.Type, interface{}) interface{}) []map[string]interface{}

Maps return the array of maps representation of a DataFrame.

func (DataFrame) Mutate ¶

func (df DataFrame) Mutate(ss ...series.Series) DataFrame

Mutate changes a column of the DataFrame with the given Series or adds it as a new column if the column name does not exist.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
	"github.com/dreamsxin/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
	)
	// Change column C with a new one
	mut := df.Mutate(
		series.New([]string{"a", "b", "c", "d"}, series.String, "C"),
	)
	// Add a new column E
	mut2 := df.Mutate(
		series.New([]string{"a", "b", "c", "d"}, series.String, "E"),
	)
	fmt.Println(mut)
	fmt.Println(mut2)

Output:

func (DataFrame) Names ¶

func (df DataFrame) Names() []string

Names returns the name of the columns on a DataFrame.

func (DataFrame) Ncol ¶

func (df DataFrame) Ncol() int

Ncol returns the number of columns on a DataFrame.

func (DataFrame) Nrow ¶

func (df DataFrame) Nrow() int

Nrow returns the number of rows on a DataFrame.

func (DataFrame) OuterJoin ¶

func (df DataFrame) OuterJoin(b DataFrame, keys ...string) DataFrame

OuterJoin returns a DataFrame containing the outer join of two DataFrames.

func (DataFrame) Pivot ¶ added in v1.0.10

func (df DataFrame) Pivot(rows []string, columns []string, values []PivotValue) DataFrame

Pivot Create a dataframe like spreadsheet-style pivot table

func (DataFrame) Print ¶ added in v1.0.1

func (df DataFrame) Print(shortRows, shortCols, showDims, showTypes bool) (str string)

func (DataFrame) RBind ¶

func (df DataFrame) RBind(dfb DataFrame) DataFrame

RBind matches the column names of two DataFrames and returns combined rows from both of them.

func (DataFrame) Rapply ¶

func (df DataFrame) Rapply(f func(series.Series) series.Series) DataFrame

Rapply applies the given function to the rows of a DataFrame. Prior to applying the function the elements of each row are cast to a Series of a specific type. In order of priority: String -> Float -> Int -> Bool. This casting also takes place after the function application to equalize the type of the columns.

func (DataFrame) Records ¶

func (df DataFrame) Records() [][]string

Records return the string record representation of a DataFrame.

func (DataFrame) Rename ¶

func (df DataFrame) Rename(newname, oldname string) DataFrame

Rename changes the name of one of the columns of a DataFrame

func (DataFrame) RightJoin ¶

func (df DataFrame) RightJoin(b DataFrame, keys ...string) DataFrame

RightJoin returns a DataFrame containing the right join of two DataFrames.

func (DataFrame) Select ¶

func (df DataFrame) Select(indexes SelectIndexes) DataFrame

Select the given DataFrame columns

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
	)
	sel1 := df.Select([]int{0, 2})
	sel2 := df.Select([]string{"A", "C"})
	fmt.Println(sel1)
	fmt.Println(sel2)

}

Output:

[4x2] DataFrame

    A        C
 0: a        5.100000
 1: k        7.000000
 2: k        6.000000
 3: a        7.100000
    <string> <float>

[4x2] DataFrame

    A        C
 0: a        5.100000
 1: k        7.000000
 2: k        6.000000
 3: a        7.100000
    <string> <float>

func (DataFrame) Set ¶

func (df DataFrame) Set(indexes series.Indexes, newvalues DataFrame) DataFrame

Set will update the values of a DataFrame for the rows selected via indexes.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
	"github.com/dreamsxin/gota/series"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
	)
	df2 := df.Set(
		series.Ints([]int{0, 2}),
		dataframe.LoadRecords(
			[][]string{
				{"A", "B", "C", "D"},
				{"b", "4", "6.0", "true"},
				{"c", "3", "6.0", "false"},
			},
		),
	)
	fmt.Println(df2)

}

Output:

[4x4] DataFrame

    A        B     C        D
 0: b        4     6.000000 true
 1: k        5     7.000000 true
 2: c        3     6.000000 false
 3: a        2     7.100000 false
    <string> <int> <float>  <bool>

func (DataFrame) SetNames ¶

func (df DataFrame) SetNames(colnames ...string) error

SetNames changes the column names of a DataFrame to the ones passed as an argument.

func (DataFrame) Show ¶ added in v1.0.12

func (df DataFrame) Show() error

func (DataFrame) SliceRow ¶ added in v1.0.10

func (df DataFrame) SliceRow(start, end int) DataFrame

func (DataFrame) String ¶

func (df DataFrame) String() (str string)

String implements the Stringer interface for DataFrame

func (DataFrame) Subset ¶

func (df DataFrame) Subset(indexes series.Indexes) DataFrame

Subset returns a subset of the rows of the original DataFrame based on the Series subsetting indexes.

Example ¶

package main

import (
	"fmt"

	"github.com/dreamsxin/gota/dataframe"
)

func main() {
	df := dataframe.LoadRecords(
		[][]string{
			{"A", "B", "C", "D"},
			{"a", "4", "5.1", "true"},
			{"k", "5", "7.0", "true"},
			{"k", "4", "6.0", "true"},
			{"a", "2", "7.1", "false"},
		},
	)
	sub := df.Subset([]int{0, 2})
	fmt.Println(sub)

}

Output:

[2x4] DataFrame

    A        B     C        D
 0: a        4     5.100000 true
 1: k        4     6.000000 true
    <string> <int> <float>  <bool>

func (DataFrame) T ¶ added in v1.0.10

func (df DataFrame) T() mat.Matrix

func (DataFrame) Types ¶

func (df DataFrame) Types() []series.Type

Types returns the types of the columns on a DataFrame.

func (DataFrame) WriteCSV ¶

func (df DataFrame) WriteCSV(w io.Writer, options ...WriteOption) error

WriteCSV writes the DataFrame to the given io.Writer as a CSV file.

func (DataFrame) WriteJSON ¶

func (df DataFrame) WriteJSON(w io.Writer) error

WriteJSON writes the DataFrame to the given io.Writer as a JSON array.

type F ¶

type F struct {
	Colidx     int
	Colname    string
	Comparator series.Comparator
	Comparando interface{}
}

F is the filtering structure

type Groups ¶

type Groups struct {
	Err error
	// contains filtered or unexported fields
}

Groups : structure generated by groupby

func (Groups) Agg ¶ added in v1.0.2

func (gps Groups) Agg(typ AggregationType, colnames []string) DataFrame

Agg :Aggregate dataframe by aggregation type and aggregation column name

func (Groups) Aggregation ¶

func (gps Groups) Aggregation(typs []AggregationType, colnames []string) DataFrame

Aggregation :Aggregate dataframe by aggregation type and aggregation column name

func (Groups) GetGroups ¶

func (g Groups) GetGroups() map[string]DataFrame

GetGroups returns the grouped data frames created by GroupBy

type LoadOption ¶

type LoadOption func(*loadOptions)

LoadOption is the type used to configure the load of elements

func DefaultType ¶

func DefaultType(t series.Type) LoadOption

DefaultType sets the defaultType option for loadOptions.

func DetectTypes ¶

func DetectTypes(b bool) LoadOption

DetectTypes sets the detectTypes option for loadOptions.

func HasHeader ¶

func HasHeader(b bool) LoadOption

HasHeader sets the hasHeader option for loadOptions.

func NaNValues ¶

func NaNValues(nanValues []string) LoadOption

NaNValues sets the nanValues option for loadOptions.

func Names ¶

func Names(names ...string) LoadOption

Names sets the names option for loadOptions.

func WithComments ¶

func WithComments(b rune) LoadOption

WithComments sets the csv comment line detect to remove lines

func WithDelimiter ¶

func WithDelimiter(b rune) LoadOption

WithDelimiter sets the csv delimiter other than ',', for example '\t'

func WithLazyQuotes ¶

func WithLazyQuotes(b bool) LoadOption

WithLazyQuotes sets csv parsing option to LazyQuotes

func WithSkipColIdxs ¶ added in v1.0.13

func WithSkipColIdxs(m map[int]int) LoadOption

func WithSkipColNames ¶ added in v1.0.13

func WithSkipColNames(m map[string]string) LoadOption

WithSkipCol

func WithTypes ¶

func WithTypes(coltypes map[string]series.Type) LoadOption

WithTypes sets the types option for loadOptions.

type Matrix ¶

type Matrix interface {
	Dims() (r, c int)
	At(i, j int) float64
}

Matrix is an interface which is compatible with gonum's mat.Matrix interface

type Order ¶

type Order struct {
	Colname string
	Reverse bool
}

Order is the ordering structure

func RevSort ¶

func RevSort(colname string) Order

RevSort return an ordering structure for reverse column sorting.

func Sort ¶

func Sort(colname string) Order

Sort return an ordering structure for regular column sorting sort.

type PivotValue ¶ added in v1.0.10

type PivotValue struct {
	Colname         string
	AggregationType AggregationType
}

type SelectIndexes ¶

type SelectIndexes interface{}

SelectIndexes are the supported indexes used for the DataFrame.Select method. Currently supported are:

int              // Matches the given index number
[]int            // Matches all given index numbers
[]bool           // Matches all columns marked as true
string           // Matches the column with the matching column name
[]string         // Matches all columns with the matching column names
Series [Int]     // Same as []int
Series [Bool]    // Same as []bool
Series [String]  // Same as []string

type WriteOption ¶

type WriteOption func(*writeOptions)

WriteOption is the type used to configure the writing of elements

func WriteHeader ¶

func WriteHeader(b bool) WriteOption

WriteHeader sets the writeHeader option for writeOptions.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL