gambas

package module

v0.1.0 Latest Latest Go to latest Published: Jul 18, 2022 License: BSD-3-Clause Imports: 13 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/jpoly1219/gambas

Links

Open Source Insights

README ¶

gambas

gambas is a data analysis package for Go that provides an intuitive way to manipulate tabular data. The project is inspired by the famous Python library pandas.

Installation

$ go get -u github.com/jpoly1219/gambas

Documentation

The documentation can be found in our pkg.go.dev page.

Project Goals

Provide basic features from the pandas tutorial.
- Providing Series and DataFrame data types
- Reading and writing tabular data
  - Reading CSV files
  - Writing to CSV files
  - Reading Excel files
  - Writing to Excel files
  - Reading JSON files
  - Writing to JSON files
- Selecting a subset of data
  - At, IAt
  - Loc, ILoc
- Plotting
- Creating new columns derived from existing columns
  - Creating new columns
  - Applying operations to the new column
  - Renaming columns
- Calculating summary statistics
  - Mean, median, standard deviation
  - Min, max, quartiles
  - Count, describe
- Reshaping the layout of tables
  - Sorting by index
  - Sorting by values
  - Sorting by given index
  - Groupby
  - Pivot (long to wide format)
  - PivotTable (long to wide format)
  - Melt (wide to long format)
- Combining data from multiple tables
  - Concatenate
  - Merge
- Handling time series data
  - Timestamp type
  - Timestamp type methods
  - ToDatetime
- Manipulating textual data
- Multiindex
Documentation (pkg.go.dev page)
Project website
Project logo

Philosophy

gambas was created to serve the needs of Go developers who wanted a robust data analysis package. pandas is an amazing tool, and is considered the industry standard when it comes to data organization and manipulation.

We didn't have a solid alternative in the Go realm. According to the Go Developer Survey 2021 Results, missing critical libraries were one of the most common barriers to using Go. You may have used Go for some time now, but you might've missed some of the libraries you used when you were using Python. gambas aims to scratch that itch. You will be able to tap into the superpowers of pandas while using your favorite language Go.

Go is a very attractive language with a very loyal userbase. It provides a pleasant developer experience with its simple syntax and strong typing. However, Go currently tends to be skewed towards developing services. 49% of projects written in Go are API/RPC services, and another 10% are for web services. The ultimate goal for gambas is to allow the Go programming language to be a major player in the data analysis field.

Documentation ¶

Index ¶

func WriteCsv(df DataFrame, pathToFile string) (os.FileInfo, error)
func WriteExcel(df DataFrame, pathToFile string) (os.FileInfo, error)
func WriteJson(df DataFrame, pathToFile string) (os.FileInfo, error)
type DataFrame
type GroupBy
- func (gb *GroupBy) Agg(targetCol []string, aggFunc StatsFunc) (DataFrame, error)
type Index
type IndexData
- func CreateRangeIndex(length int) IndexData
type Series
- func NewSeries(data []interface{}, name string, index *IndexData) (Series, error)
type StatsFunc
type StatsResult

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func WriteCsv ¶

func WriteCsv(df DataFrame, pathToFile string) (os.FileInfo, error)

WriteCsv writes a DataFrame object to CSV file. It is recommended to generate pathToFile using `filepath.Join`.

func WriteExcel ¶ added in v0.1.0

func WriteExcel(df DataFrame, pathToFile string) (os.FileInfo, error)

WriteExcel writes a DataFrame object into an Excel file.

func WriteJson ¶

func WriteJson(df DataFrame, pathToFile string) (os.FileInfo, error)

WriteJson writes a DataFrame object to a file.

Types ¶

type DataFrame ¶

type DataFrame struct {
	// contains filtered or unexported fields
}

DataFrame type represents a 2D tabular dataset. A DataFrame object is comprised of multiple Series objects.

func NewDataFrame ¶

func NewDataFrame(data [][]interface{}, columns []string, indexCols []string) (DataFrame, error)

NewDataFrame created a new DataFrame object from given parameters. Generally, NewDataFrameFromFile will be used more often.

func ReadCsv ¶

func ReadCsv(pathToFile string, indexCols []string) (DataFrame, error)

ReadCsv reads a CSV file and returns a new DataFrame object. It is recommended to generate pathToFile using `filepath.Join`.

func ReadExcel ¶ added in v0.1.0

func ReadExcel(pathToFile, sheetName string, axis int) (DataFrame, error)

ReadExcel reads an excel file and converts it to a DataFrame object. The axis depends on the layout of the data. Row-based data where each group represents a row will have an axis=0. Column-based data where each group represents a column will have an axis=1.

func ReadJsonByColumns ¶

func ReadJsonByColumns(pathToFile string, indexCols []string) (DataFrame, error)

ReadJson reads a JSON file and returns a new DataFrame object. It is recommended to generate pathToFile using `filepath.Join`. The JSON file should be in this format: {"col1":[val1, val2, ...], "col2":[val1, val2, ...], ...} You can either set a column to be the index, or set it as nil. If nil, a new RangeIndex will be created. Your index column should not have any missing values. Order of columns is not guaranteed, but the index column will always come first.

func ReadJsonStream ¶

func ReadJsonStream(pathToFile string, indexCols []string) (DataFrame, error)

ReadJsonStream reads a JSON stream and returns a new DataFrame object. The JSON file should be in this format: {"col1":val1, "col2":val2, ...}{"col1":val1, "col2":val2, ...}

func (*DataFrame) ColAdd ¶

func (df *DataFrame) ColAdd(colname string, value float64) (DataFrame, error)

ColAdd() adds the given value to each element in the specified column.

func (*DataFrame) ColDiv ¶

func (df *DataFrame) ColDiv(colname string, value float64) (DataFrame, error)

ColDiv() divides each element in the specified column by the given value.

func (*DataFrame) ColEq ¶

func (df *DataFrame) ColEq(colname string, value float64) (DataFrame, error)

ColEq() checks if each element in the specified column is equal to the given value.

func (*DataFrame) ColGt ¶

func (df *DataFrame) ColGt(colname string, value float64) (DataFrame, error)

ColGt() checks if each element in the specified column is greater than the given value.

func (*DataFrame) ColLt ¶

func (df *DataFrame) ColLt(colname string, value float64) (DataFrame, error)

ColLt() checks if each element in the specified column is less than the given value.

func (*DataFrame) ColMod ¶

func (df *DataFrame) ColMod(colname string, value float64) (DataFrame, error)

ColMod() applies modulus calculations on each element in the specified column, returning the remainder.

func (*DataFrame) ColMul ¶

func (df *DataFrame) ColMul(colname string, value float64) (DataFrame, error)

ColMul() multiplies each element in the specified column by the given value.

func (*DataFrame) ColSub ¶

func (df *DataFrame) ColSub(colname string, value float64) (DataFrame, error)

ColSub() subtracts the given value from each element in the specified column.

func (*DataFrame) DropNaN ¶

func (df *DataFrame) DropNaN(axis int) (DataFrame, error)

DropNaN drops rows or columns with NaN values. Specify axis to choose whether to remove rows with NaN or columns with NaN. axis=0 is row, axis=1 is column.

func (*DataFrame) GroupBy ¶

func (df *DataFrame) GroupBy(by ...string) (GroupBy, error)

GroupBy groups selected columns in a DataFrame object and returns a GroupBy object.

func (*DataFrame) Head ¶

func (df *DataFrame) Head(howMany int)

Head prints the first n items in the dataframe.

func (*DataFrame) Loc ¶

func (df *DataFrame) Loc(cols []string, rows ...[]interface{}) (DataFrame, error)

Loc indexes the DataFrame object given a slice of row and column labels.

func (*DataFrame) LocCols ¶

func (df *DataFrame) LocCols(cols ...string) (DataFrame, error)

LocRows returns a set of columns as a new DataFrame object, given a list of labels.

func (*DataFrame) LocColsItems ¶

func (df *DataFrame) LocColsItems(cols ...string) ([][]interface{}, error)

LocColsItems will return a slice of columns. Use this over LocCols if you want to extract the items directly instead of getting a DataFrame object.

func (*DataFrame) LocRows ¶

func (df *DataFrame) LocRows(rows ...[]interface{}) (DataFrame, error)

LocRows returns a set of rows as a new DataFrame object, given a list of labels.

func (*DataFrame) LocRowsItems ¶

func (df *DataFrame) LocRowsItems(rows ...[]interface{}) ([][]interface{}, error)

LocRowsItems will return a slice of rows. Use this over LocRows if you want to extract the items directly instead of getting a DataFrame object.

func (*DataFrame) MarshalJSON ¶

func (df *DataFrame) MarshalJSON() ([]byte, error)

MarshalJSON is used to implement the json.Marshaler interface{}.

func (*DataFrame) Melt ¶

func (df *DataFrame) Melt(colName, valueName string) (DataFrame, error)

Melt returns the table from wide to long format. Use Melt to revert to pre-Pivot format.

func (*DataFrame) MergeDfsHorizontally ¶ added in v0.1.0

func (df *DataFrame) MergeDfsHorizontally(target DataFrame) (DataFrame, error)

MergeDfsHorizontally merges two DataFrame objects side by side. The target DataFrame will always be appended to the right of the source DataFrame. Index will reset and become a RangeIndex.

func (*DataFrame) MergeDfsVertically ¶ added in v0.1.0

func (df *DataFrame) MergeDfsVertically(target DataFrame) (DataFrame, error)

MergeDfsVertically stacks two DataFrame objects vertically.

func (*DataFrame) NewCol ¶

func (df *DataFrame) NewCol(colname string, data []interface{}) (DataFrame, error)

NewCol creates a new column with the given data and column name. To create a blank column, pass in a slice with empty string values like so: []interface{}{"", "", "", ...}

func (*DataFrame) NewDerivedCol ¶

func (df *DataFrame) NewDerivedCol(colname, srcCol string) (DataFrame, error)

NewDerivedCol creates a new column derived from an existing column. It copies over the data from a column named srcCol into a new column. You can then apply column operations such as ColAdd to the new column.

func (*DataFrame) Pivot ¶

func (df *DataFrame) Pivot(column, value string) (DataFrame, error)

Pivot returns an organized dataframe that has values corresponding to the index and the given column.

func (*DataFrame) PivotTable ¶

func (df *DataFrame) PivotTable(index, column, value string, aggFunc StatsFunc) (DataFrame, error)

PivotTable rearranges the data by a given index and column. Each value will be aggregated via an aggregation function. Pick three columns from the DataFrame, each to serve as the index, column, and value. PivotTable ignores NaN values.

func (*DataFrame) Print ¶

func (df *DataFrame) Print()

Print prints all data in a DataFrame object.

func (*DataFrame) PrintRange ¶

func (df *DataFrame) PrintRange(start, end int)

PrintRange prints x at a given range. Index starts at 0. For example, to print 3 elements starting from the 2nd element, use PrintRange(2, 5).

func (*DataFrame) RenameCol ¶

func (df *DataFrame) RenameCol(colnames map[string]string) error

RenameCol renames columns in a DataFrame.

func (*DataFrame) SortByColumns ¶

func (df *DataFrame) SortByColumns()

SortByColumns sorts the columns of the DataFrame object.

func (*DataFrame) SortByIndex ¶

func (df *DataFrame) SortByIndex(ascending bool) error

SortByIndex sorts the items by index.

func (*DataFrame) SortByValues ¶

func (df *DataFrame) SortByValues(by string, ascending bool) error

SortByValues sorts the items by values in a selected series.

func (*DataFrame) SortIndexColFirst ¶

func (df *DataFrame) SortIndexColFirst()

SortIndexColFirst puts the index column at the front.

func (*DataFrame) Tail ¶

func (df *DataFrame) Tail(howMany int)

Tail prints the last n items in the dataframe.

type GroupBy ¶

type GroupBy struct {
	// contains filtered or unexported fields
}

GroupBy type is a intermediary struct that is created after running DataFrame.GroupBy(). It holds the necessary data for applying operations such as GroupBy.Agg().

func (*GroupBy) Agg ¶

func (gb *GroupBy) Agg(targetCol []string, aggFunc StatsFunc) (DataFrame, error)

Agg aggregates data in the GroupBy object using the given aggFunc.

type Index ¶

type Index struct {
	// contains filtered or unexported fields
}

Index stores the index values of a series and dataframe. The 0th element must be the ID of the index. For example, if your data includes a column of names that you have set to be the index, the index may look like this: Index{0, "Alice"}, Index{1, "Bob"}, Index{2, "Charlie"}. Index{} with more than one value (not including the ID) is considered a multi-index.

type IndexData ¶

type IndexData struct {
	// contains filtered or unexported fields
}

IndexData type is used to hold index information of a Series or a DataFrame.

func CreateRangeIndex ¶

func CreateRangeIndex(length int) IndexData

CreateRangeIndex takes the length of an Index and creates a RangeIndex. RangeIndex is an index that spans from 0 to the length of the index.

func (IndexData) Len ¶

func (id IndexData) Len() int

Len is used to implement the sort.Sort interface.

func (IndexData) Less ¶

func (id IndexData) Less(i, j int) bool

Less is used to implement the sort.Sort interface.

func (IndexData) Swap ¶

func (id IndexData) Swap(i, j int)

Swap is used to implement the sort.Sort interface.

type Series ¶

type Series struct {
	// contains filtered or unexported fields
}

Series type represents a column of data.

func NewSeries ¶

func NewSeries(data []interface{}, name string, index *IndexData) (Series, error)

NewSeries created a new Series object from given parameters. Generally, NewSeriesFromFile will be used more often. The index parameter can be set to nil when calling NewSeries on its own. This field is for passing in the DataFrame's index data in NewDataFrame.

func (*Series) At ¶

func (s *Series) At(ind ...interface{}) (interface{}, error)

At returns an element at a given index. For multiindex, you need to pass in the whole index tuple.

func (*Series) Count ¶

func (s *Series) Count() StatsResult

Count counts the number of non-NA elements in a column.

func (*Series) Describe ¶

func (s *Series) Describe() ([]float64, error)

Describe runs through the most commonly used statistics functions and prints the output.

func (*Series) Head ¶

func (s *Series) Head(howMany int)

Head prints the first n items in the series.

func (*Series) IAt ¶

func (s *Series) IAt(ind int) (interface{}, error)

IAt returns an element at a given integer index.

func (*Series) ILoc ¶

func (s *Series) ILoc(min, max int) ([]interface{}, error)

ILoc returns an array of elements at a given integer index range.

func (*Series) IndexHasDuplicateValues ¶

func (s *Series) IndexHasDuplicateValues() (bool, error)

IndexHasDuplicateValues checks if the Series have duplicate index values.

func (Series) Len ¶

func (s Series) Len() int

Len is used to implement the sort.Sort interface.

func (Series) Less ¶

func (s Series) Less(i, j int) bool

Less is used to implement the sort.Sort interface.

func (*Series) Loc ¶

func (s *Series) Loc(idx ...[]interface{}) (Series, error)

Loc returns a range of data at given rows.

func (*Series) LocItems ¶

func (s *Series) LocItems(idx ...[]interface{}) ([]interface{}, error)

LocItems returns a slice of data at given rows. Use this over Loc if you want to extract the items directly instead of getting a Series object.

func (*Series) Max ¶

func (s *Series) Max() StatsResult

Max returns the largest element is a column.

func (*Series) Mean ¶

func (s *Series) Mean() StatsResult

Mean returns the mean of the elements in a column.

func (*Series) Median ¶

func (s *Series) Median() StatsResult

Median returns the median of the elements in a column.

func (*Series) Min ¶

func (s *Series) Min() StatsResult

Min returns the smallest element in a column.

func (*Series) Print ¶

func (s *Series) Print()

Print prints all data in a Series object.

func (*Series) PrintRange ¶

func (s *Series) PrintRange(start, end int)

PrintRange prints x at a given range. Index starts at 0. For example, to print 3 elements starting from the 2nd element, use PrintRange(2, 5).

func (*Series) Q1 ¶

func (s *Series) Q1() StatsResult

Q1 returns the lower quartile (25%) of the elements in a column. This does not include the median during calculation.

func (*Series) Q2 ¶

func (s *Series) Q2() StatsResult

Q2 returns the middle quartile (50%) of the elements in a column. This accomplishes the same thing as s.Median().

func (*Series) Q3 ¶

func (s *Series) Q3() StatsResult

Q3 returns the upper quartile (75%) of the elements in a column. This does not include the median during calculation.

func (*Series) RenameCol ¶

func (s *Series) RenameCol(newName string)

RenameCol renames the series.

func (*Series) RenameIndex ¶

func (s *Series) RenameIndex(newNames map[string]string) error

RenameIndex renames the index of the series. Input should be a map, where key is the index name to change and value is a new name.

func (*Series) SortByGivenIndex ¶

func (s *Series) SortByGivenIndex(index IndexData) error

SortByGivenIndex sorts the Series by a given index.

func (*Series) SortByIndex ¶

func (s *Series) SortByIndex(ascending bool) error

SortByIndex sorts the elements in a series by the index.

func (*Series) SortByValues ¶

func (s *Series) SortByValues(ascending bool) error

SortByValues sorts the Series by its values.

func (*Series) Std ¶

func (s *Series) Std() StatsResult

Std returns the sample standard deviation of the elements in a column.

func (Series) Swap ¶

func (s Series) Swap(i, j int)

Swap is used to implement the sort.Sort interface.

func (*Series) Tail ¶

func (s *Series) Tail(howMany int)

Tail prints the last n items in the dataframe.

func (*Series) ValueCounts ¶

func (s *Series) ValueCounts() (Series, error)

ValueCounts returns a Series containing the number of unique values in a given Series.

type StatsFunc ¶

type StatsFunc func(dataset []interface{}) StatsResult

StatsFunc represents any function that accepts dataset as input and returns StatsResult as output.

type StatsResult ¶

type StatsResult struct {
	UsedFunc string
	Result   float64
	Err      error
}

StatsResult holds the results of calculation from a statistics function such as Mean or Median.

func Count ¶

func Count(dataset []interface{}) StatsResult

Count counts the number of non-NA elements in a column.

func Max ¶

func Max(dataset []interface{}) StatsResult

Max returns the largest element is a column.

func Mean ¶

func Mean(dataset []interface{}) StatsResult

Mean returns the mean of the elements in a column.

func Median ¶

func Median(dataset []interface{}) StatsResult

Median returns the median of the elements in a column.

func Min ¶

func Min(dataset []interface{}) StatsResult

Min returns the smallest element in a column.

func Q1 ¶

func Q1(dataset []interface{}) StatsResult

Q1 returns the lower quartile (25%) of the elements in a column. This does not include the median during calculation.

func Q2 ¶

func Q2(dataset []interface{}) StatsResult

Q2 returns the middle quartile (50%) of the elements in a column. This accomplishes the same thing as s.Median().

func Q3 ¶

func Q3(dataset []interface{}) StatsResult

Q3 returns the upper quartile (75%) of the elements in a column. This does not include the median during calculation.

func Std ¶

func Std(dataset []interface{}) StatsResult

Std returns the sample standard deviation of the elements in a column.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL