tabula

package

v0.57.0 Latest Latest Go to latest Published: Sep 3, 2024 License: BSD-3-Clause, BSD-3-Clause Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

git.sr.ht/~shulhan/pakakeh.go

Links

Open Source Insights

README ¶

cover.run go

Package tabula is a Go library for working with rows, columns, or matrix (table), or in another terms working with data set.

Overview

Go's slice gave a flexible way to manage sequence of data in one type, but what if you want to manage a sequence of value but with different type of data? Or manage a bunch of values like a table?

You can use this library to manage sequence of value with different type and manage data in two dimensional tuple.

Terminology

Here are some terminologies that we used in developing this library, which may help reader understand the internal and API.

Record is a single cell in row or column, or the smallest building block of dataset.

Row is a horizontal representation of records in dataset.

Column is a vertical representation of records in dataset. Each column has a unique name and has the same type data.

Dataset is a collection of rows and columns.

Given those definitions we can draw the representation of rows, columns, or matrix:

        COL-0  COL-1 ...  COL-x
ROW-0: record record ... record
ROW-1: record record ... record
...
ROW-y: record record ... record

What make this package different from other dataset packages?

Record Type

There are only three valid type in record: int64, float64, and string.

Each record is a pointer to interface value. Which means,

Switching between rows to columns mode, or vice versa, is only a matter of pointer switching, no memory relocations.
When using matrix mode, additional memory is used only to allocate slice, the record in each rows and columns is shared.

Dataset Mode

Tabula has three mode for dataset: rows, columns, or matrix.

For example, given a table of data,

col1,col2,col3
a,b,c
1,2,3

When in "rows" mode, each line is saved in its own slice, resulting in Rows:
```
Rows[0]: [a b c]
Rows[1]: [1 2 3]
```
Columns is used only to save record metadata: column name, type, flag and value space.
When in "columns" mode, each line saved in columns, resulting in Columns:
```
Columns[0]: {col1 0 0 [] [a 1]}
Columns[1]: {col2 0 0 [] [b 2]}
Columns[1]: {col3 0 0 [] [c 3]}
```
Each column will contain metadata including column name, type, flag, and value space (all possible value that may contain in column value).

Rows in "columns" mode is empty.
When in "matrix" mode, each record is saved both in row and column using shared pointer to record.

Matrix mode consume more memory by allocating two slice in rows and columns, but give flexible way to manage records.

Features

Switching between rows and columns mode.
Random pick rows with or without replacement.
Random pick columns with or without replacement.
Select column from dataset by index.
Sort columns by index, or indirect sort.
Split rows value by numeric. For example, given two numeric rows,
```
A: {1,2,3,4}
B: {5,6,7,8}
```
if we split row by value 7, the data will splitted into left set
```
A': {1,2}
B': {5,6}
```
and the right set would be
```
A'': {3,4}
B'': {7,8}
```
Split rows by string. For example, given two rows,
```
X: [A,B,A,B,C,D,C,D]
Y: [1,2,3,4,5,6,7,8]
```
if we split the rows with value set [A,C], the data will splitted into left set which contain all rows that have A or C,
```
		X': [A,A,C,C]
		Y': [1,3,5,7]
```
and the right set, excluded set, will contain all rows which is not A or C,
```
		X'': [B,B,D,D]
		Y'': [2,4,6,8]
```
Select row where. Select row at column index x where their value is equal to y (an analogy to select where in SQL). For example, given a rows of dataset,
```
ROW-1: {1,A}
ROW-2: {2,B}
ROW-3: {3,A}
ROW-4: {4,C}
```
we can select row where the second column contain 'A', which result in,
```
ROW-1: {1,A}
ROW-3: {3,A}
```

Documentation ¶

Overview ¶

Package tabula is a Go library for working with rows, columns, or matrix (table), or in another terms working with data set.

Introduction ¶

Go's slice gave a flexible way to manage sequence of data in one type, but what if you want to manage a sequence of value but with different type of data? Or manage a bunch of values like a table?

You can use this library to manage sequence of value with different type and manage data in two dimensional tuple.

Terminology ¶

Here are some terminologies that we used in developing this library, which may help reader understand the internal and API.

Record is a single cell in row or column, or the smallest building block of dataset.

Row is a horizontal representation of records in dataset.

Column is a vertical representation of records in dataset. Each column has a unique name and has the same type data.

Dataset is a collection of rows and columns.

Given those definitions we can draw the representation of rows, columns, or matrix:

        COL-0  COL-1 ...  COL-x
ROW-0: record record ... record
ROW-1: record record ... record
...
ROW-y: record record ... record

Record Type ¶

There are only three valid type in record: int64, float64, and string.

Dataset Mode ¶

Tabula has three mode for dataset: rows, columns, or matrix.

For example, given a table of data,

col1,col2,col3
a,b,c
1,2,3

"rows" mode is where each line saved in its own slice, resulting in Rows:

Rows[0]: [a b c]
Rows[1]: [1 2 3]

"columns" mode is where each line saved by columns, resulting in Columns:

Columns[0]: {col1 0 0 [] [a 1]}
Columns[1]: {col2 0 0 [] [b 2]}
Columns[1]: {col3 0 0 [] [c 3]}

Unlike rows mode, each column contain metadata including column name, type, flag, and value space (all possible value that _may_ contain in column value).

"matrix" mode is where each record saved both in row and column.

Matrix mode consume more memory but give a flexible way to manage records.

Index ¶

Constants
Variables
func RandomPickColumns(dataset DatasetInterface, n int, dup bool, excludeIdx []int) (picked DatasetInterface, unpicked DatasetInterface, pickedIdx []int, ...)
func RandomPickRows(dataset DatasetInterface, n int, duplicate bool) (picked DatasetInterface, unpicked DatasetInterface, pickedIdx []int, ...)
func ReadDatasetConfig(ds interface{}, fcfg string) (e error)
func SortColumnsByIndex(di DatasetInterface, sortedIdx []int)
func SplitRowsByCategorical(di DatasetInterface, colidx int, splitVal []string) (splitIn DatasetInterface, splitEx DatasetInterface, e error)
func SplitRowsByNumeric(di DatasetInterface, colidx int, splitVal float64) (splitLess DatasetInterface, splitGreater DatasetInterface, e error)
func SplitRowsByValue(di DatasetInterface, colidx int, value interface{}) (splitL DatasetInterface, splitR DatasetInterface, e error)
type Claset
- func NewClaset(mode int, types []int, names []string) (claset *Claset)
- func (claset *Claset) Clone() interface{}
- func (claset *Claset) CountValueSpaces()
- func (claset *Claset) Counts() []int
- func (claset *Claset) GetClassAsInteger() []int64
- func (claset *Claset) GetClassAsReals() []float64
- func (claset *Claset) GetClassAsStrings() []string
- func (claset *Claset) GetClassColumn() *Column
- func (claset *Claset) GetClassIndex() int
- func (claset *Claset) GetClassRecords() *Records
- func (claset *Claset) GetClassType() int
- func (claset *Claset) GetClassValueSpace() []string
- func (claset *Claset) GetDataset() DatasetInterface
- func (claset *Claset) GetMinorityRows() *Rows
- func (claset *Claset) IsInSingleClass() (single bool, class string)
- func (claset *Claset) MajorityClass() string
- func (claset *Claset) MinorityClass() string
- func (claset *Claset) RecountMajorMinor()
- func (claset *Claset) SetClassIndex(v int)
- func (claset *Claset) SetDataset(dataset DatasetInterface)
- func (claset *Claset) SetMajorityClass(v string)
- func (claset *Claset) SetMinorityClass(v string)
- func (claset *Claset) String() (s string)
type ClasetInterface
type Column
- func NewColumn(colType int, colName string) (col *Column)
- func NewColumnInt(data []int64, colName string) (col *Column)
- func NewColumnReal(data []float64, colName string) (col *Column)
- func NewColumnString(data []string, colType int, colName string) (col *Column, e error)
- func (col *Column) ClearValues()
- func (col *Column) DeleteRecordAt(i int) *Record
- func (col *Column) GetName() string
- func (col *Column) GetType() int
- func (col *Column) Interface() interface{}
- func (col *Column) Len() int
- func (col *Column) PushBack(r *Record)
- func (col *Column) PushRecords(rs []*Record)
- func (col *Column) Reset()
- func (col *Column) SetName(name string)
- func (col *Column) SetRecords(recs *Records)
- func (col *Column) SetType(tipe int)
- func (col *Column) SetValueAt(idx int, v string)
- func (col *Column) SetValueByNumericAt(idx int, v float64)
- func (col *Column) SetValues(values []string)
- func (col *Column) ToFloatSlice() (newcol []float64)
- func (col *Column) ToIntegers() []int64
- func (col *Column) ToStringSlice() (newcol []string)
type ColumnInterface
type Columns
- func (cols *Columns) GetMinMaxLength() (min, max int)
- func (cols *Columns) Join(row int, sep, esc []byte) (v []byte)
- func (cols *Columns) Len() int
- func (cols *Columns) RandomPick(n int, dup bool, excludeIdx []int) (picked Columns, unpicked Columns, pickedIdx []int, unpickedIdx []int)
- func (cols *Columns) Reset()
- func (cols *Columns) SetTypes(types []int)
type Dataset
- func NewDataset(mode int, types []int, names []string) (dataset *Dataset)
- func (dataset *Dataset) AddColumn(tipe int, name string, vs []string)
- func (dataset *Dataset) Clone() interface{}
- func (dataset *Dataset) DeleteRow(i int) (row *Row)
- func (dataset *Dataset) FillRowsWithColumn(colIdx int, col Column)
- func (dataset *Dataset) GetColumn(idx int) (col *Column)
- func (dataset *Dataset) GetColumnByName(name string) (col *Column)
- func (dataset *Dataset) GetColumnTypeAt(idx int) (int, error)
- func (dataset *Dataset) GetColumns() *Columns
- func (dataset *Dataset) GetColumnsName() (names []string)
- func (dataset *Dataset) GetColumnsType() (types []int)
- func (dataset *Dataset) GetData() interface{}
- func (dataset *Dataset) GetDataAsColumns() (columns *Columns)
- func (dataset *Dataset) GetDataAsRows() *Rows
- func (dataset *Dataset) GetMode() int
- func (dataset *Dataset) GetNColumn() (ncol int)
- func (dataset *Dataset) GetNRow() (nrow int)
- func (dataset *Dataset) GetRow(idx int) *Row
- func (dataset *Dataset) GetRows() *Rows
- func (dataset *Dataset) Init(mode int, types []int, names []string)
- func (dataset *Dataset) Len() int
- func (dataset *Dataset) MergeColumns(other DatasetInterface)
- func (dataset *Dataset) MergeRows(other DatasetInterface)
- func (dataset *Dataset) PushColumn(col Column)
- func (dataset *Dataset) PushColumnToRows(col Column)
- func (dataset *Dataset) PushRow(row *Row)
- func (dataset *Dataset) PushRowToColumns(row *Row)
- func (dataset *Dataset) Reset() error
- func (dataset *Dataset) SetColumnTypeAt(idx, tipe int) error
- func (dataset *Dataset) SetColumns(cols *Columns)
- func (dataset *Dataset) SetColumnsName(names []string)
- func (dataset *Dataset) SetColumnsType(types []int)
- func (dataset *Dataset) SetMode(mode int)
- func (dataset *Dataset) SetRows(rows *Rows)
- func (dataset *Dataset) TransposeToColumns()
- func (dataset *Dataset) TransposeToRows()
type DatasetInterface
- func SelectColumnsByIdx(dataset DatasetInterface, colsIdx []int) (newset DatasetInterface)
- func SelectRowsWhere(dataset DatasetInterface, colidx int, colval string) DatasetInterface
type MapRows
- func (mapRows *MapRows) AddRow(k string, v *Row)
- func (mapRows *MapRows) GetMinority() (keyMin string, valMin Rows)
type MapRowsElement
type Matrix
type Record
- func NewRecord() *Record
- func NewRecordBy(v string, t int) (r *Record, e error)
- func NewRecordInt(v int64) (r *Record)
- func NewRecordReal(v float64) (r *Record)
- func NewRecordString(v string) (r *Record)
- func (r *Record) Bytes() []byte
- func (r *Record) Clone() *Record
- func (r *Record) Float() (f64 float64)
- func (r *Record) Integer() (i64 int64)
- func (r *Record) Interface() interface{}
- func (r *Record) IsEqual(o *Record) bool
- func (r *Record) IsEqualToInterface(v interface{}) bool
- func (r *Record) IsEqualToString(v string) bool
- func (r *Record) IsMissingValue() bool
- func (r *Record) IsNil() bool
- func (r *Record) Reset()
- func (r *Record) SetFloat(v float64)
- func (r *Record) SetInteger(v int64)
- func (r *Record) SetString(v string)
- func (r *Record) SetValue(v string, t int) error
- func (r Record) String() (s string)
- func (r *Record) Type() int
type Records
- func (recs *Records) CountWhere(v interface{}) (c int)
- func (recs *Records) CountsWhere(vs []interface{}) (counts []int)
- func (recs *Records) Len() int
- func (recs *Records) SortByIndex(sortedIdx []int) *Records
type Row
- func (row *Row) Clone() *Row
- func (row *Row) GetIntAt(idx int) (int64, bool)
- func (row *Row) GetRecord(i int) *Record
- func (row *Row) GetValueAt(idx int) (interface{}, bool)
- func (row *Row) IsEqual(other *Row) bool
- func (row *Row) IsNilAt(idx int) bool
- func (row *Row) Len() int
- func (row *Row) PushBack(r *Record)
- func (row *Row) SetValueAt(idx int, rec *Record)
- func (row *Row) Types() (types []int)
type Rows
- func (rows *Rows) Contain(xrow *Row) (bool, int)
- func (rows *Rows) Contains(xrows Rows) (isin bool, indices []int)
- func (rows *Rows) Del(i int) (row *Row)
- func (rows *Rows) GroupByValue(groupIdx int) (mapRows MapRows)
- func (rows *Rows) Len() int
- func (rows *Rows) PopFront() (row *Row)
- func (rows *Rows) PopFrontAsRows() (newRows Rows)
- func (rows *Rows) PushBack(r *Row)
- func (rows *Rows) RandomPick(n int, duplicate bool) (picked Rows, unpicked Rows, pickedIdx []int, unpickedIdx []int)
- func (rows *Rows) SelectWhere(colidx int, colval string) (selected Rows)
- func (rows Rows) String() (s string)

Constants ¶

View Source

const (
	// DatasetNoMode default to matrix.
	DatasetNoMode = 0
	// DatasetModeRows for output mode in rows.
	DatasetModeRows = 1
	// DatasetModeColumns for output mode in columns.
	DatasetModeColumns = 2
	// DatasetModeMatrix will save data in rows and columns.
	DatasetModeMatrix = 4
)

View Source

const (
	// TUndefined for undefined type
	TUndefined = -1
	// TString string type.
	TString = 0
	// TInteger integer type (64 bit).
	TInteger = 1
	// TReal float type (64 bit).
	TReal = 2
)

Variables ¶

View Source

var (
	// ErrColIdxOutOfRange operation on column index is invalid
	ErrColIdxOutOfRange = errors.New("tabula: Column index out of range")
	// ErrInvalidColType operation on column with different type
	ErrInvalidColType = errors.New("tabula: Invalid column type")
	// ErrMisColLength returned when operation on columns does not match
	// between parameter and their length
	ErrMisColLength = errors.New("tabula: mismatch on column length")
)

Functions ¶

func RandomPickColumns ¶

func RandomPickColumns(dataset DatasetInterface, n int, dup bool, excludeIdx []int) (
	picked DatasetInterface,
	unpicked DatasetInterface,
	pickedIdx []int,
	unpickedIdx []int,
)

RandomPickColumns will select `n` column randomly from dataset and return new dataset with picked and unpicked columns, and their column index.

If duplicate is true, column that has been pick up can be pick up again.

If dataset output mode is rows, it will transposed to columns.

func RandomPickRows ¶

func RandomPickRows(dataset DatasetInterface, n int, duplicate bool) (
	picked DatasetInterface,
	unpicked DatasetInterface,
	pickedIdx []int,
	unpickedIdx []int,
)

RandomPickRows return `n` item of row that has been selected randomly from dataset.Rows. The ids of rows that has been picked is saved id `pickedIdx`.

If duplicate is true, the row that has been picked can be picked up again, otherwise it only allow one pick. This is also called as random selection with or without replacement in machine learning domain.

If output mode is columns, it will be transposed to rows.

func ReadDatasetConfig ¶

func ReadDatasetConfig(ds interface{}, fcfg string) (e error)

ReadDatasetConfig open dataset configuration file and initialize dataset field from there.

func SortColumnsByIndex ¶

func SortColumnsByIndex(di DatasetInterface, sortedIdx []int)

SortColumnsByIndex will sort all columns using sorted index.

func SplitRowsByCategorical ¶

func SplitRowsByCategorical(di DatasetInterface, colidx int, splitVal []string) (
	splitIn DatasetInterface,
	splitEx DatasetInterface,
	e error,
)

SplitRowsByCategorical will split the data using a set of split value in column `colidx`.

For example, given two attributes,

X: [A,B,A,B,C,D,C,D]
Y: [1,2,3,4,5,6,7,8]

if colidx is (0) or A and split value is a set `[A,C]`, the data will splitted into left set which contain all rows that have A or C,

X': [A,A,C,C]
Y': [1,3,5,7]

and the right set, excluded set, will contain all rows which is not A or C,

X'': [B,B,D,D]
Y'': [2,4,6,8]

func SplitRowsByNumeric ¶

func SplitRowsByNumeric(di DatasetInterface, colidx int, splitVal float64) (
	splitLess DatasetInterface,
	splitGreater DatasetInterface,
	e error,
)

SplitRowsByNumeric will split the data using splitVal in column `colidx`.

For example, given two continuous attribute,

A: {1,2,3,4}
B: {5,6,7,8}

if colidx is (1) B and splitVal is 7, the data will splitted into left set

A': {1,2}
B': {5,6}

and right set

A'': {3,4}
B'': {7,8}

func SplitRowsByValue ¶

func SplitRowsByValue(di DatasetInterface, colidx int, value interface{}) (
	splitL DatasetInterface,
	splitR DatasetInterface,
	e error,
)

SplitRowsByValue generic function to split data by value. This function will split data using value in column `colidx`. If value is numeric it will return any rows that have column value less than `value` in `splitL`, and any column value greater or equal to `value` in `splitR`.

Types ¶

type Claset ¶

type Claset struct {

	// Dataset embedded, for implementing the dataset interface.
	Dataset

	// ClassIndex contain index for target classification in columns.
	ClassIndex int `json:"ClassIndex"`
	// contains filtered or unexported fields
}

Claset define a dataset with class attribute.

func NewClaset ¶

func NewClaset(mode int, types []int, names []string) (claset *Claset)

NewClaset create and return new Claset object.

func (*Claset) Clone ¶

func (claset *Claset) Clone() interface{}

Clone return a copy of current claset object.

func (*Claset) CountValueSpaces ¶

func (claset *Claset) CountValueSpaces()

CountValueSpaces will count number of value space in current dataset.

func (*Claset) Counts ¶

func (claset *Claset) Counts() []int

Counts return the number of each class in value-space.

func (*Claset) GetClassAsInteger ¶

func (claset *Claset) GetClassAsInteger() []int64

GetClassAsInteger return class record value as slice of int64.

func (*Claset) GetClassAsReals ¶

func (claset *Claset) GetClassAsReals() []float64

GetClassAsReals return class record value as slice of float64.

func (*Claset) GetClassAsStrings ¶

func (claset *Claset) GetClassAsStrings() []string

GetClassAsStrings return all class values as slice of string.

func (*Claset) GetClassColumn ¶

func (claset *Claset) GetClassColumn() *Column

GetClassColumn return dataset class values in column.

func (*Claset) GetClassIndex ¶

func (claset *Claset) GetClassIndex() int

GetClassIndex return index of class attribute in dataset.

func (*Claset) GetClassRecords ¶

func (claset *Claset) GetClassRecords() *Records

GetClassRecords return class values as records.

func (*Claset) GetClassType ¶

func (claset *Claset) GetClassType() int

GetClassType return type of class in dataset.

func (*Claset) GetClassValueSpace ¶

func (claset *Claset) GetClassValueSpace() []string

GetClassValueSpace return the class value space.

func (*Claset) GetDataset ¶

func (claset *Claset) GetDataset() DatasetInterface

GetDataset return the dataset.

func (*Claset) GetMinorityRows ¶

func (claset *Claset) GetMinorityRows() *Rows

GetMinorityRows return rows where their class is minority in dataset, or nil if dataset is empty.

func (*Claset) IsInSingleClass ¶

func (claset *Claset) IsInSingleClass() (single bool, class string)

IsInSingleClass check whether all target class contain only single value. Return true and name of target if all rows is in the same class, false and empty string otherwise.

func (*Claset) MajorityClass ¶

func (claset *Claset) MajorityClass() string

MajorityClass return the majority class of data.

func (*Claset) MinorityClass ¶

func (claset *Claset) MinorityClass() string

MinorityClass return the minority class in dataset.

func (*Claset) RecountMajorMinor ¶

func (claset *Claset) RecountMajorMinor()

RecountMajorMinor recount major and minor class in claset.

func (*Claset) SetClassIndex ¶

func (claset *Claset) SetClassIndex(v int)

SetClassIndex will set the class index to `v`.

func (*Claset) SetDataset ¶

func (claset *Claset) SetDataset(dataset DatasetInterface)

SetDataset in class set.

func (*Claset) SetMajorityClass ¶

func (claset *Claset) SetMajorityClass(v string)

SetMajorityClass will set the majority class to `v`.

func (*Claset) SetMinorityClass ¶

func (claset *Claset) SetMinorityClass(v string)

SetMinorityClass will set the minority class to `v`.

func (*Claset) String ¶

func (claset *Claset) String() (s string)

String, yes it will pretty print the meta-data in JSON format.

type ClasetInterface ¶

type ClasetInterface interface {
	DatasetInterface

	GetClassType() int
	GetClassValueSpace() []string
	GetClassColumn() *Column
	GetClassRecords() *Records
	GetClassAsStrings() []string
	GetClassAsReals() []float64
	GetClassIndex() int
	MajorityClass() string
	MinorityClass() string
	Counts() []int

	SetDataset(DatasetInterface)
	SetClassIndex(int)
	SetMajorityClass(string)
	SetMinorityClass(string)

	CountValueSpaces()
	RecountMajorMinor()
	IsInSingleClass() (bool, string)

	GetMinorityRows() *Rows
}

ClasetInterface is the interface for working with dataset containing class or target attribute. It embed dataset interface.

Yes, the name is Claset with single `s` not Classset with triple `s` to minimize typo.

type Column ¶

type Column struct {
	// Name of column. String identifier for the column.
	Name string

	// ValueSpace contain the possible value in records
	ValueSpace []string

	// Records contain column data.
	Records Records

	// Type of column. All record in column have the same type.
	Type int

	// Flag additional attribute that can be set to mark some value on this
	// column
	Flag int
}

Column represent slice of record. A vertical representation of data.

func NewColumn ¶

func NewColumn(colType int, colName string) (col *Column)

NewColumn return new column with type and name.

func NewColumnInt ¶

func NewColumnInt(data []int64, colName string) (col *Column)

NewColumnInt create new column with record type as integer, and fill it with `data`.

func NewColumnReal ¶

func NewColumnReal(data []float64, colName string) (col *Column)

NewColumnReal create new column with record type is real.

func NewColumnString ¶

func NewColumnString(data []string, colType int, colName string) (
	col *Column,
	e error,
)

NewColumnString initialize column with type anda data as string.

func (*Column) ClearValues ¶

func (col *Column) ClearValues()

ClearValues set all value in column to empty string or zero if column type is numeric.

func (*Column) DeleteRecordAt ¶

func (col *Column) DeleteRecordAt(i int) *Record

DeleteRecordAt will delete record at index `i` and return it.

func (*Column) GetName ¶

func (col *Column) GetName() string

GetName return the column name.

func (*Column) GetType ¶

func (col *Column) GetType() int

GetType return the type of column.

func (*Column) Interface ¶

func (col *Column) Interface() interface{}

Interface return the column object as an interface.

func (*Column) Len ¶

func (col *Column) Len() int

Len return number of record.

func (*Column) PushBack ¶

func (col *Column) PushBack(r *Record)

PushBack push record the end of column.

func (*Column) PushRecords ¶

func (col *Column) PushRecords(rs []*Record)

PushRecords append slice of record to the end of column's records.

func (*Column) Reset ¶

func (col *Column) Reset()

Reset column data and flag.

func (*Column) SetName ¶

func (col *Column) SetName(name string)

SetName will set the name of column to `name`.

func (*Column) SetRecords ¶

func (col *Column) SetRecords(recs *Records)

SetRecords will set records in column to `recs`.

func (*Column) SetType ¶

func (col *Column) SetType(tipe int)

SetType will set the type of column to `tipe`.

func (*Column) SetValueAt ¶

func (col *Column) SetValueAt(idx int, v string)

SetValueAt will set column value at cell `idx` with `v`, unless the index is out of range.

func (*Column) SetValueByNumericAt ¶

func (col *Column) SetValueByNumericAt(idx int, v float64)

SetValueByNumericAt will set column value at cell `idx` with numeric value `v`, unless the index is out of range.

func (*Column) SetValues ¶

func (col *Column) SetValues(values []string)

SetValues of all column record.

func (*Column) ToFloatSlice ¶

func (col *Column) ToFloatSlice() (newcol []float64)

ToFloatSlice convert slice of record to slice of float64.

func (*Column) ToIntegers ¶

func (col *Column) ToIntegers() []int64

ToIntegers convert slice of record to slice of int64.

func (*Column) ToStringSlice ¶

func (col *Column) ToStringSlice() (newcol []string)

ToStringSlice convert slice of record to slice of string.

type ColumnInterface ¶

type ColumnInterface interface {
	SetType(tipe int)
	SetName(name string)

	GetType() int
	GetName() string

	SetRecords(recs *Records)

	Interface() interface{}
}

ColumnInterface define an interface for working with Column.

type Columns ¶

type Columns []Column

Columns represent slice of Column.

func (*Columns) GetMinMaxLength ¶

func (cols *Columns) GetMinMaxLength() (min, max int)

GetMinMaxLength given a slice of column, find the minimum and maximum column length among them.

func (*Columns) Join ¶

func (cols *Columns) Join(row int, sep, esc []byte) (v []byte)

Join all column records value at index `row` using separator `sep` and make sure if there is a separator in value it will be escaped with `esc`.

Given slice of columns, where row is 1 and sep is `,` and escape is `\`

  0 1 2
0 A B C
1 D , F <- row
2 G H I

this function will return "D,\,,F" in bytes.

func (*Columns) Len ¶

func (cols *Columns) Len() int

Len return length of columns.

func (*Columns) RandomPick ¶

func (cols *Columns) RandomPick(n int, dup bool, excludeIdx []int) (
	picked Columns,
	unpicked Columns,
	pickedIdx []int,
	unpickedIdx []int,
)

RandomPick column in columns until n item and return it like its has been shuffled. If duplicate is true, column that has been picked can be picked up again, otherwise it will only picked up once.

This function return picked and unpicked column and index of them.

func (*Columns) Reset ¶

func (cols *Columns) Reset()

Reset each data and attribute in all columns.

func (*Columns) SetTypes ¶

func (cols *Columns) SetTypes(types []int)

SetTypes of each column. The length of type must be equal with the number of column, otherwise it will used the minimum length between types or columns.

type Dataset ¶

type Dataset struct {
	// Columns is input data that has been parsed.
	Columns Columns

	// Rows is input data that has been parsed.
	Rows Rows

	// Mode define the numeric value of output mode.
	Mode int
}

Dataset contain the data, mode of saved data, number of columns and rows in data.

func NewDataset ¶

func NewDataset(mode int, types []int, names []string) (
	dataset *Dataset,
)

NewDataset create new dataset, use the mode to initialize the dataset.

func (*Dataset) AddColumn ¶

func (dataset *Dataset) AddColumn(tipe int, name string, vs []string)

AddColumn will create and add new empty column with specific type and name into dataset.

func (*Dataset) Clone ¶

func (dataset *Dataset) Clone() interface{}

Clone return a copy of current dataset.

func (*Dataset) DeleteRow ¶

func (dataset *Dataset) DeleteRow(i int) (row *Row)

DeleteRow will detach row at index `i` from dataset and return it.

func (*Dataset) FillRowsWithColumn ¶

func (dataset *Dataset) FillRowsWithColumn(colIdx int, col Column)

FillRowsWithColumn given a column, fill the dataset with row where the record only set at index `colIdx`.

Example, content of dataset was,

index: 0 1 2

A B C
X     (step 1) nrow = 2

If we filled column at index 2 with [Y Z], the dataset will become:

index: 0 1 2

A B C
X   Y (step 2) fill the empty row
    Z (step 3) create dummy row which contain the rest of column data.

func (*Dataset) GetColumn ¶

func (dataset *Dataset) GetColumn(idx int) (col *Column)

GetColumn return pointer to column object at index `idx`. If `idx` is out of range return nil.

func (*Dataset) GetColumnByName ¶

func (dataset *Dataset) GetColumnByName(name string) (col *Column)

GetColumnByName return column based on their `name`.

func (*Dataset) GetColumnTypeAt ¶

func (dataset *Dataset) GetColumnTypeAt(idx int) (int, error)

GetColumnTypeAt return type of column in index `colidx` in dataset.

func (*Dataset) GetColumns ¶

func (dataset *Dataset) GetColumns() *Columns

GetColumns return columns in dataset, without transposing.

func (*Dataset) GetColumnsName ¶

func (dataset *Dataset) GetColumnsName() (names []string)

GetColumnsName return name of all columns.

func (*Dataset) GetColumnsType ¶

func (dataset *Dataset) GetColumnsType() (types []int)

GetColumnsType return the type of all columns.

func (*Dataset) GetData ¶

func (dataset *Dataset) GetData() interface{}

GetData return the data, based on mode (rows, columns, or matrix).

func (*Dataset) GetDataAsColumns ¶

func (dataset *Dataset) GetDataAsColumns() (columns *Columns)

GetDataAsColumns return data in columns mode.

func (*Dataset) GetDataAsRows ¶

func (dataset *Dataset) GetDataAsRows() *Rows

GetDataAsRows return data in rows mode.

func (*Dataset) GetMode ¶

func (dataset *Dataset) GetMode() int

GetMode return mode of data.

func (*Dataset) GetNColumn ¶

func (dataset *Dataset) GetNColumn() (ncol int)

GetNColumn return the number of column in dataset.

func (*Dataset) GetNRow ¶

func (dataset *Dataset) GetNRow() (nrow int)

GetNRow return number of rows in dataset.

func (*Dataset) GetRow ¶

func (dataset *Dataset) GetRow(idx int) *Row

GetRow return pointer to row at index `idx` or nil if index is out of range.

func (*Dataset) GetRows ¶

func (dataset *Dataset) GetRows() *Rows

GetRows return rows in dataset, without transposing.

func (*Dataset) Init ¶

func (dataset *Dataset) Init(mode int, types []int, names []string)

Init will set the dataset using mode and types.

func (*Dataset) Len ¶

func (dataset *Dataset) Len() int

Len return number of row in dataset.

func (*Dataset) MergeColumns ¶

func (dataset *Dataset) MergeColumns(other DatasetInterface)

MergeColumns append columns from other dataset into current dataset.

func (*Dataset) MergeRows ¶

func (dataset *Dataset) MergeRows(other DatasetInterface)

MergeRows append rows from other dataset into current dataset.

func (*Dataset) PushColumn ¶

func (dataset *Dataset) PushColumn(col Column)

PushColumn will append new column to the end of slice if no existing column with the same name. If it exist, the records will be merged.

func (*Dataset) PushColumnToRows ¶

func (dataset *Dataset) PushColumnToRows(col Column)

PushColumnToRows add each record in column to each rows, from top to bottom.

func (*Dataset) PushRow ¶

func (dataset *Dataset) PushRow(row *Row)

PushRow save the data, which is already in row object, to Rows.

func (*Dataset) PushRowToColumns ¶

func (dataset *Dataset) PushRowToColumns(row *Row)

PushRowToColumns push each data in Row to Columns.

func (*Dataset) Reset ¶

func (dataset *Dataset) Reset() error

Reset all data and attributes.

func (*Dataset) SetColumnTypeAt ¶

func (dataset *Dataset) SetColumnTypeAt(idx, tipe int) error

SetColumnTypeAt will set column type at index `colidx` to `tipe`.

func (*Dataset) SetColumns ¶

func (dataset *Dataset) SetColumns(cols *Columns)

SetColumns will replace current columns with new one from parameter.

func (*Dataset) SetColumnsName ¶

func (dataset *Dataset) SetColumnsName(names []string)

SetColumnsName set column name.

func (*Dataset) SetColumnsType ¶

func (dataset *Dataset) SetColumnsType(types []int)

SetColumnsType of data in all columns.

func (*Dataset) SetMode ¶

func (dataset *Dataset) SetMode(mode int)

SetMode of saved data to `mode`.

func (*Dataset) SetRows ¶

func (dataset *Dataset) SetRows(rows *Rows)

SetRows will replace current rows with new one from parameter.

func (*Dataset) TransposeToColumns ¶

func (dataset *Dataset) TransposeToColumns()

TransposeToColumns move all data from rows (horizontal) to columns (vertical) mode.

func (*Dataset) TransposeToRows ¶

func (dataset *Dataset) TransposeToRows()

TransposeToRows will move all data from columns (vertical) to rows (horizontal) mode.

type DatasetInterface ¶

type DatasetInterface interface {
	Init(mode int, types []int, names []string)
	Clone() interface{}
	Reset() error

	GetMode() int
	SetMode(mode int)

	GetNColumn() int
	GetNRow() int
	Len() int

	GetColumnsType() []int
	SetColumnsType(types []int)

	GetColumnTypeAt(idx int) (int, error)
	SetColumnTypeAt(idx, tipe int) error

	GetColumnsName() []string
	SetColumnsName(names []string)

	AddColumn(tipe int, name string, vs []string)
	GetColumn(idx int) *Column
	GetColumnByName(name string) *Column
	GetColumns() *Columns
	SetColumns(*Columns)

	GetRow(idx int) *Row
	GetRows() *Rows
	SetRows(*Rows)
	DeleteRow(idx int) *Row

	GetData() interface{}
	GetDataAsRows() *Rows
	GetDataAsColumns() *Columns

	TransposeToColumns()
	TransposeToRows()

	PushRow(r *Row)
	PushRowToColumns(r *Row)
	FillRowsWithColumn(colidx int, col Column)
	PushColumn(col Column)
	PushColumnToRows(col Column)

	MergeColumns(DatasetInterface)
	MergeRows(DatasetInterface)
}

DatasetInterface is the interface for working with DSV data.

func SelectColumnsByIdx ¶

func SelectColumnsByIdx(dataset DatasetInterface, colsIdx []int) (
	newset DatasetInterface,
)

SelectColumnsByIdx return new dataset with selected column index.

func SelectRowsWhere ¶

func SelectRowsWhere(dataset DatasetInterface, colidx int, colval string) DatasetInterface

SelectRowsWhere return all rows which column value in `colidx` is equal to `colval`.

type MapRows ¶

type MapRows []MapRowsElement

MapRows represent a list of mapping between string key and rows.

func (*MapRows) AddRow ¶

func (mapRows *MapRows) AddRow(k string, v *Row)

AddRow will append a row `v` into map value if they key `k` exist in map, otherwise it will insert a new map element.

func (*MapRows) GetMinority ¶

func (mapRows *MapRows) GetMinority() (keyMin string, valMin Rows)

GetMinority return map value which contain the minimum rows.

type MapRowsElement ¶

type MapRowsElement struct {
	Key   string
	Value Rows
}

MapRowsElement represent a single mapping of string key to rows.

type Matrix ¶

type Matrix struct {
	Columns *Columns
	Rows    *Rows
}

Matrix is a combination of columns and rows.

type Record ¶

type Record struct {
	// contains filtered or unexported fields
}

Record represent the smallest building block of data-set.

func NewRecord ¶

func NewRecord() *Record

NewRecord will create and return record with nil value.

func NewRecordBy ¶

func NewRecordBy(v string, t int) (r *Record, e error)

NewRecordBy create new record from string with type set to `t`.

func NewRecordInt ¶

func NewRecordInt(v int64) (r *Record)

NewRecordInt create new record from integer value.

func NewRecordReal ¶

func NewRecordReal(v float64) (r *Record)

NewRecordReal create new record from float value.

func NewRecordString ¶

func NewRecordString(v string) (r *Record)

NewRecordString will create new record from string.

func (*Record) Bytes ¶

func (r *Record) Bytes() []byte

Bytes convert record value to slice of byte.

func (*Record) Clone ¶

func (r *Record) Clone() *Record

Clone will create and return a clone of record.

func (*Record) Float ¶

func (r *Record) Float() (f64 float64)

Float convert given record to float value. If its failed it will return the -Infinity value.

func (*Record) Integer ¶

func (r *Record) Integer() (i64 int64)

Integer convert given record to integer value. If its failed, it will return the minimum integer in 64bit.

func (*Record) Interface ¶

func (r *Record) Interface() interface{}

Interface return record value as interface.

func (*Record) IsEqual ¶

func (r *Record) IsEqual(o *Record) bool

IsEqual return true if record is equal with other, otherwise return false.

func (*Record) IsEqualToInterface ¶

func (r *Record) IsEqualToInterface(v interface{}) bool

IsEqualToInterface return true if interface type and value equal to record type and value.

func (*Record) IsEqualToString ¶

func (r *Record) IsEqualToString(v string) bool

IsEqualToString return true if string representation of record value is equal to string `v`.

func (*Record) IsMissingValue ¶

func (r *Record) IsMissingValue() bool

IsMissingValue check wether the value is a missing attribute.

If its string the missing value is indicated by character '?'.

If its integer the missing value is indicated by minimum negative integer, or math.MinInt64.

If its real the missing value is indicated by -Inf.

func (*Record) IsNil ¶

func (r *Record) IsNil() bool

IsNil return true if record has not been set with value, or nil.

func (*Record) Reset ¶

func (r *Record) Reset()

Reset will reset record value to empty string or zero, depend on type.

func (*Record) SetFloat ¶

func (r *Record) SetFloat(v float64)

SetFloat will set the record value with float 64bit.

func (*Record) SetInteger ¶

func (r *Record) SetInteger(v int64)

SetInteger will set the record value with integer 64bit.

func (*Record) SetString ¶

func (r *Record) SetString(v string)

SetString will set the record value with string value.

func (*Record) SetValue ¶

func (r *Record) SetValue(v string, t int) error

SetValue set the record value from string using type `t`. If value can not be converted to type, it will return an error.

func (Record) String ¶

func (r Record) String() (s string)

String convert record value to string.

func (*Record) Type ¶

func (r *Record) Type() int

Type of record.

type Records ¶

type Records []*Record

Records define slice of pointer to Record.

func (*Records) CountWhere ¶

func (recs *Records) CountWhere(v interface{}) (c int)

CountWhere return number of record where its value is equal to `v` type and value.

func (*Records) CountsWhere ¶

func (recs *Records) CountsWhere(vs []interface{}) (counts []int)

CountsWhere will return count of each value in slice `sv`.

func (*Records) Len ¶

func (recs *Records) Len() int

Len will return the length of records.

func (*Records) SortByIndex ¶

func (recs *Records) SortByIndex(sortedIdx []int) *Records

SortByIndex will sort the records using slice of index `sortedIDx` and return it.

type Row ¶

type Row []*Record

Row represent slice of record.

func (*Row) Clone ¶

func (row *Row) Clone() *Row

Clone create and return a clone of row.

func (*Row) GetIntAt ¶

func (row *Row) GetIntAt(idx int) (int64, bool)

GetIntAt return the integer value of row record at index `idx`. If the index is out of range it will return 0 and false.

func (*Row) GetRecord ¶

func (row *Row) GetRecord(i int) *Record

GetRecord will return pointer to record at index `i`, or nil if index is out of range.

func (*Row) GetValueAt ¶

func (row *Row) GetValueAt(idx int) (interface{}, bool)

GetValueAt return the value of row record at index `idx`. If the index is out of range it will return nil and false

func (*Row) IsEqual ¶

func (row *Row) IsEqual(other *Row) bool

IsEqual return true if row content equal with `other` row, otherwise return false.

func (*Row) IsNilAt ¶

func (row *Row) IsNilAt(idx int) bool

IsNilAt return true if there is no record value in row at `idx`, otherwise return false.

func (*Row) Len ¶

func (row *Row) Len() int

Len return number of record in row.

func (*Row) PushBack ¶

func (row *Row) PushBack(r *Record)

PushBack will add new record to the end of row.

func (*Row) SetValueAt ¶

func (row *Row) SetValueAt(idx int, rec *Record)

SetValueAt will set the value of row at cell index `idx` with record `rec`.

func (*Row) Types ¶

func (row *Row) Types() (types []int)

Types return type of all records.

type Rows ¶

type Rows []*Row

Rows represent slice of Row.

func (*Rows) Contain ¶

func (rows *Rows) Contain(xrow *Row) (bool, int)

Contain return true and index of row, if rows has data that has the same value with `row`, otherwise return false and -1 as index.

func (*Rows) Contains ¶

func (rows *Rows) Contains(xrows Rows) (isin bool, indices []int)

Contains return true and indices of row, if rows has data that has the same value with `rows`, otherwise return false and empty indices.

func (*Rows) Del ¶

func (rows *Rows) Del(i int) (row *Row)

Del will detach row at index `i` from slice and return it.

func (*Rows) GroupByValue ¶

func (rows *Rows) GroupByValue(groupIdx int) (mapRows MapRows)

GroupByValue will group each row based on record value in index recGroupIdx into map of string -> *Row.

WARNING: returned rows will be empty!

For example, given rows with target group in column index 1,

[1 +]
[2 -]
[3 -]
[4 +]

this function will create a map with key is string of target and value is pointer to sub-rows,

-> [1 +] [4 +]
-> [2 -] [3 -]

func (*Rows) Len ¶

func (rows *Rows) Len() int

Len return number of row.

func (*Rows) PopFront ¶

func (rows *Rows) PopFront() (row *Row)

PopFront remove the head, return the record value.

func (*Rows) PopFrontAsRows ¶

func (rows *Rows) PopFrontAsRows() (newRows Rows)

PopFrontAsRows remove the head and return ex-head as new rows.

func (*Rows) PushBack ¶

func (rows *Rows) PushBack(r *Row)

PushBack append record r to the end of rows.

func (*Rows) RandomPick ¶

func (rows *Rows) RandomPick(n int, duplicate bool) (
	picked Rows,
	unpicked Rows,
	pickedIdx []int,
	unpickedIdx []int,
)

RandomPick row in rows until n item and return it like its has been shuffled. If duplicate is true, row that has been picked can be picked up again, otherwise it will only picked up once.

This function return picked and unpicked rows and index of them.

func (*Rows) SelectWhere ¶

func (rows *Rows) SelectWhere(colidx int, colval string) (selected Rows)

SelectWhere return all rows which column value in `colidx` is equal to `colval`.

func (Rows) String ¶

func (rows Rows) String() (s string)

String return the string representation of each row.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL