table

package
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2024 License: BSD-3-Clause Imports: 22 Imported by: 0

README

table

Go Reference

table provides a DataTable / DataFrame structure similar to pandas and xarray in Python, and Apache Arrow Table, using tensor n-dimensional columns aligned by common outermost row dimension.

See examples/dataproc for a demo of how to use this system for data analysis, paralleling the example in Python Data Science using pandas, to see directly how that translates into this framework.

As a general convention, it is safest, clearest, and quite fast to access columns by name instead of index (there is a map that caches the column indexes), so the base access method names generally take a column name argument, and those that take a column index have an Index suffix. In addition, we use the Try suffix for versions that return an error message. It is a bit painful for the writer of these methods but very convenient for the users.

The following packages are included:

  • bitslice is a Go slice of bytes []byte that has methods for setting individual bits, as if it was a slice of bools, while being 8x more memory efficient. This is used for encoding null entries in etensor, and as a Tensor of bool / bits there as well, and is generally very useful for binary (boolean) data.

  • etensor is a Tensor (n-dimensional array) object. etensor.Tensor is an interface that applies to many different type-specific instances, such as etensor.Float32. A tensor is just a etensor.Shape plus a slice holding the specific data type. Our tensor is based directly on the Apache Arrow project's tensor, and it fully interoperates with it. Arrow tensors are designed to be read-only, and we needed some extra support to make our etable.Table work well, so we had to roll our own. Our tensors also interoperate fully with Gonum's 2D-specific Matrix type for the 2D case.

  • etable has the etable.Table DataTable / DataFrame object, which is useful for many different data analysis and database functions, and also for holding patterns to present to a neural network, and logs of output from the models, etc. A etable.Table is just a slice of etensor.Tensor columns, that are all aligned along the outer-most row dimension. Index-based indirection, which is essential for efficient Sort, Filter etc, is provided by the etable.IndexView type, which is an indexed view into a Table. All data processing operations are defined on the IndexView.

  • eplot provides an interactive 2D plotting GUI in GoGi for Table data, using the gonum plot plotting package. You can select which columns to plot and specify various basic plot parameters.

  • tensorview provides an interactive tabular, spreadsheet-style GUI using GoGi for viewing and editing etable.Table and etable.Tensor objects. The tensorview.TensorGrid also provides a colored grid display higher-dimensional tensor data.

  • agg provides standard aggregation functions (Sum, Mean, Var, Std etc) operating over etable.IndexView views of Table data. It also defines standard AggFunc functions such as SumFunc which can be used for Agg functions on either a Tensor or IndexView.

  • tsragg provides the same agg functions as in agg, but operating on all the values in a given Tensor. Because of the indexed, row-based nature of tensors in a Table, these are not the same as the agg functions.

  • split supports splitting a Table into any number of indexed sub-views and aggregating over those (i.e., pivot tables), grouping, summarizing data, etc.

  • metric provides similarity / distance metrics such as Euclidean, Cosine, or Correlation that operate on slices of []float64 or []float32.

  • simat provides similarity / distance matrix computation methods operating on etensor.Tensor or etable.Table data. The SimMat type holds the resulting matrix and labels for the rows and columns, which has a special SimMatGrid view in etview for visualizing labeled similarity matricies.

  • pca provides principal-components-analysis (PCA) and covariance matrix computation functions.

  • clust provides standard agglomerative hierarchical clustering including ability to plot results in an eplot.

  • minmax is home of basic Min / Max range struct, and norm has lots of good functions for computing standard norms and normalizing vectors.

  • utils has various table-related utility command-line utility tools, including etcat which combines multiple table files into one file, including option for averaging column data.

Cheat Sheet

et is the etable pointer variable for examples below:

Table Access

Scalar columns:

val := et.CellFloat("ColName", row)
str := et.CellString("ColName", row)

Tensor (higher-dimensional) columns:

tsr := et.CellTensor("ColName", row) // entire tensor at cell (a row-level SubSpace of column tensor)
val := et.CellTensorFloat1D("ColName", row, cellidx) // idx is 1D index into cell tensor

Set Table Value

et.SetCellFloat("ColName", row, val)
et.SetCellString("ColName", row, str)

Tensor (higher-dimensional) columns:

et.SetCellTensor("ColName", row, tsr) // set entire tensor at cell 
et.SetCellTensorFloat1D("ColName", row, cellidx, val) // idx is 1D index into cell tensor

Find Value(s) in Column

Returns all rows where value matches given value, in string form (any number will convert to a string)

rows := et.RowsByString("ColName", "value", etable.Contains, etable.IgnoreCase)

Other options are etable.Equals instead of Contains to search for an exact full string, and etable.UseCase if case should be used instead of ignored.

Index Views (Sort, Filter, etc)

The IndexView provides a list of row-wise indexes into a table, and Sorting, Filtering and Splitting all operate on this index view without changing the underlying table data, for maximum efficiency and flexibility.

ix := etable.NewIndexView(et) // new view with all rows
Sort
ix.SortColName("Name", etable.Ascending) // etable.Ascending or etable.Descending
SortedTable := ix.NewTable() // turn an IndexView back into a new Table organized in order of indexes

or:

nmcl := et.ColByName("Name") // nmcl is an etensor of the Name column, cached
ix.Sort(func(t *Table, i, j int) bool {
	return nmcl.StringValue1D(i) < nmcl.StringValue1D(j)
})
Filter
nmcl := et.ColByName("Name") // column we're filtering on
ix.Filter(func(t *Table, row int) bool {
	// filter return value is for what to *keep* (=true), not exclude
	// here we keep any row with a name that contains the string "in"
	return strings.Contains(nmcl.StringValue1D(row), "in")
})
Splits ("pivot tables" etc), Aggregation

Create a table of mean values of "Data" column grouped by unique entries in "Name" column, resulting table will be called "DataMean":

byNm := split.GroupBy(ix, []string{"Name"}) // column name(s) to group by
split.Agg(byNm, "Data", agg.AggMean) // 
gps := byNm.AggsToTable(etable.AddAggName) // etable.AddAggName or etable.ColNameOnly for naming cols

Describe (basic stats) all columns in a table:

ix := etable.NewIndexView(et) // new view with all rows
desc := agg.DescAll(ix) // summary stats of all columns
// get value at given column name (from original table), row "Mean"
mean := desc.CellFloat("ColNm", desc.RowsByString("Agg", "Mean", etable.Equals, etable.UseCase)[0])

CSV / TSV file format

Tables can be saved and loaded from CSV (comma separated values) or TSV (tab separated values) files. See the next section for special formatting of header strings in these files to record the type and tensor cell shapes.

Type and Tensor Headers

To capture the type and shape of the columns, we support the following header formatting. We weren't able to find any other widely supported standard (please let us know if there is one that we've missed!)

Here is the mapping of special header prefix characters to standard types:

'$': etensor.STRING,
'%': etensor.FLOAT32,
'#': etensor.FLOAT64,
'|': etensor.INT64,
'@': etensor.UINT8,
'^': etensor.BOOl,

Columns that have tensor cell shapes (not just scalars) are marked as such with the first such column having a <ndim:dim,dim..> suffix indicating the shape of the cells in this column, e.g., <2:5,4> indicates a 2D cell Y=5,X=4. Each individual column is then indexed as [ndims:x,y..] e.g., the first would be [2:0,0], then [2:0,1] etc.

Example

Here's a TSV file for a scalar String column (Name), a 2D 1x4 tensor float32 column (Input), and a 2D 1x2 float32 Output column.

_H:	$Name	%Input[2:0,0]<2:1,4>	%Input[2:0,1]	%Input[2:0,2]	%Input[2:0,3]	%Output[2:0,0]<2:1,2>	%Output[2:0,1]
_D:	Event_0	1	0	0	0	1	0
_D:	Event_1	0	1	0	0	1	0
_D:	Event_2	0	0	1	0	0	1
_D:	Event_3	0	0	0	1	0	1

Documentation

Index

Constants

View Source
const (
	// Ascending specifies an ascending sort direction for table Sort routines
	Ascending = true

	// Descending specifies a descending sort direction for table Sort routines
	Descending = false
)
View Source
const (
	//	Headers is passed to CSV methods for the headers arg, to use headers
	// that capture full type and tensor shape information.
	Headers = true

	// NoHeaders is passed to CSV methods for the headers arg, to not use headers
	NoHeaders = false
)
View Source
const (
	// ColumnNameOnly means resulting agg table just has the original column name, no aggregation name
	ColumnNameOnly bool = true
	// AddAggName means resulting agg table columns have aggregation name appended
	AddAggName = false
)

use these for arg to ArgsToTable*

View Source
const (
	// Contains means the string only needs to contain the target string (see Equals)
	Contains bool = true
	// Equals means the string must equal the target string (see Contains)
	Equals = false
	// IgnoreCase means that differences in case are ignored in comparing strings
	IgnoreCase = true
	// UseCase means that case matters when comparing strings
	UseCase = false
)

Named arg values for Contains, IgnoreCase

Variables

View Source
var TableHeaderToType = map[byte]reflect.Kind{
	'$': reflect.String,
	'%': reflect.Float32,
	'#': reflect.Float64,
	'|': reflect.Int,
	'^': reflect.Bool,
}

TableHeaderToType maps special header characters to data type

Functions

func AddColumn

func AddColumn[T string | bool | float32 | float64 | int | int32 | byte](dt *Table, name string) tensor.Tensor

AddColumn adds a new column to the table, of given type and column name (which must be unique). The cells of this column hold a single scalar value: see AddColumnTensor for n-dimensional cells.

func AddTensorColumn

func AddTensorColumn[T string | bool | float32 | float64 | int | int32 | byte](dt *Table, name string, cellSizes []int, dimNames ...string) tensor.Tensor

AddTensorColumn adds a new n-dimensional column to the table, of given type, column name (which must be unique), and dimensionality of each _cell_. An outer-most Row dimension will be added to this dimensionality to create the tensor column.

func ConfigFromDataValues

func ConfigFromDataValues(dt *Table, hdrs []string, rec [][]string) error

ConfigFromDataValues configures a Table based on data types inferred from the string representation of given records, using header names if present.

func ConfigFromHeaders

func ConfigFromHeaders(dt *Table, hdrs []string, rec [][]string) error

ConfigFromHeaders attempts to configure Table based on the headers. for non-table headers, data is examined to determine types.

func ConfigFromTableHeaders

func ConfigFromTableHeaders(dt *Table, hdrs []string) error

ConfigFromTableHeaders attempts to configure a Table based on special table headers

func DetectTableHeaders

func DetectTableHeaders(hdrs []string) bool

DetectTableHeaders looks for special header characters -- returns true if found

func InferDataType

func InferDataType(str string) reflect.Kind

InferDataType returns the inferred data type for the given string only deals with float64, int, and string types

func ShapeFromString

func ShapeFromString(dims string) []int

ShapeFromString parses string representation of shape as N:d,d,..

func TableColumnType

func TableColumnType(nm string) (reflect.Kind, string)

TableColumnType parses the column header for special table type information

func TableHeaderChar

func TableHeaderChar(typ reflect.Kind) byte

TableHeaderChar returns the special header character based on given data type

func UpdateSliceTable

func UpdateSliceTable(st any, dt *Table)

UpdateSliceTable updates given Table with data from the given slice of structs, which must be the same type as used to configure the table

Types

type Delims

type Delims int32 //enums:enum

Delim are standard CSV delimiter options (Tab, Comma, Space)

const (
	// Tab is the tab rune delimiter, for TSV tab separated values
	Tab Delims = iota

	// Comma is the comma rune delimiter, for CSV comma separated values
	Comma

	// Space is the space rune delimiter, for SSV space separated value
	Space

	// Detect is used during reading a file -- reads the first line and detects tabs or commas
	Detect
)
const DelimsN Delims = 4

DelimsN is the highest valid value for type Delims, plus one.

func DelimsValues

func DelimsValues() []Delims

DelimsValues returns all possible values for the type Delims.

func (Delims) Desc

func (i Delims) Desc() string

Desc returns the description of the Delims value.

func (Delims) Int64

func (i Delims) Int64() int64

Int64 returns the Delims value as an int64.

func (Delims) MarshalText

func (i Delims) MarshalText() ([]byte, error)

MarshalText implements the encoding.TextMarshaler interface.

func (Delims) Rune

func (dl Delims) Rune() rune

func (*Delims) SetInt64

func (i *Delims) SetInt64(in int64)

SetInt64 sets the Delims value from an int64.

func (*Delims) SetString

func (i *Delims) SetString(s string) error

SetString sets the Delims value from its string representation, and returns an error if the string is invalid.

func (Delims) String

func (i Delims) String() string

String returns the string representation of this Delims value.

func (*Delims) UnmarshalText

func (i *Delims) UnmarshalText(text []byte) error

UnmarshalText implements the encoding.TextUnmarshaler interface.

func (Delims) Values

func (i Delims) Values() []enums.Enum

Values returns all possible values for the type Delims.

type FilterFunc

type FilterFunc func(et *Table, row int) bool

FilterFunc is a function used for filtering that returns true if Table row should be included in the current filtered view of the table, and false if it should be removed.

type IndexView

type IndexView struct {

	// Table that we are an indexed view onto
	Table *Table

	// current indexes into Table
	Indexes []int
	// contains filtered or unexported fields
}

IndexView is an indexed wrapper around an table.Table that provides a specific view onto the Table defined by the set of indexes. This provides an efficient way of sorting and filtering a table by only updating the indexes while doing nothing to the Table itself. To produce a table that has data actually organized according to the indexed order, call the NewTable method. IndexView views on a table can also be organized together as Splits of the table rows, e.g., by grouping values along a given column.

func NewIndexView

func NewIndexView(et *Table) *IndexView

NewIndexView returns a new IndexView based on given table, initialized with sequential idxes

func (*IndexView) AddIndex

func (ix *IndexView) AddIndex(idx int)

AddIndex adds a new index to the list

func (*IndexView) AddRows

func (ix *IndexView) AddRows(n int)

AddRows adds n rows to end of underlying Table, and to the indexes in this view

func (*IndexView) Clone

func (ix *IndexView) Clone() *IndexView

Clone returns a copy of the current index view with its own index memory

func (*IndexView) CopyFrom

func (ix *IndexView) CopyFrom(oix *IndexView)

CopyFrom copies from given other IndexView (we have our own unique copy of indexes)

func (*IndexView) DeleteInvalid

func (ix *IndexView) DeleteInvalid()

DeleteInvalid deletes all invalid indexes from the list. Call this if rows (could) have been deleted from table.

func (*IndexView) DeleteRows

func (ix *IndexView) DeleteRows(at, n int)

DeleteRows deletes n rows of indexes starting at given index in the list of indexes

func (*IndexView) Filter

func (ix *IndexView) Filter(filterFunc func(et *Table, row int) bool)

Filter filters the indexes into our Table using given Filter function. The Filter function operates directly on row numbers into the Table as these row numbers have already been projected through the indexes.

func (*IndexView) FilterColumn

func (ix *IndexView) FilterColumn(colIndex int, str string, exclude, contains, ignoreCase bool)

FilterColumn sorts the indexes into our Table according to values in given column index, using string representation of column values. Includes rows with matching values unless exclude is set. If contains, only checks if row contains string; if ignoreCase, ignores case. Use named args for greater clarity. Only valid for 1-dimensional columns.

func (*IndexView) FilterColumnName

func (ix *IndexView) FilterColumnName(column string, str string, exclude, contains, ignoreCase bool) error

FilterColumnName filters the indexes into our Table according to values in given column name, using string representation of column values. Includes rows with matching values unless exclude is set. If contains, only checks if row contains string; if ignoreCase, ignores case. Use named args for greater clarity. Only valid for 1-dimensional columns. Returns error if column name not found.

func (*IndexView) InsertRows

func (ix *IndexView) InsertRows(at, n int)

InsertRows adds n rows to end of underlying Table, and to the indexes starting at given index in this view

func (*IndexView) Len

func (ix *IndexView) Len() int

Len returns the length of the index list

func (*IndexView) Less

func (ix *IndexView) Less(i, j int) bool

Less calls the LessFunc for sorting

func (*IndexView) NewTable

func (ix *IndexView) NewTable() *Table

NewTable returns a new table with column data organized according to the indexes

func (*IndexView) OpenCSV

func (ix *IndexView) OpenCSV(filename core.Filename, delim Delims) error

OpenCSV reads a table idx view from a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg), using the Go standard encoding/csv reader conforming to the official CSV standard. If the table does not currently have any columns, the first row of the file is assumed to be headers, and columns are constructed therefrom. If the file was saved from table with headers, then these have full configuration information for tensor type and dimensionality. If the table DOES have existing columns, then those are used robustly for whatever information fits from each row of the file.

func (*IndexView) OpenFS

func (ix *IndexView) OpenFS(fsys fs.FS, filename string, delim Delims) error

OpenFS is the version of IndexView.OpenCSV that uses an fs.FS filesystem.

func (*IndexView) Permuted

func (ix *IndexView) Permuted()

Permuted sets indexes to a permuted order -- if indexes already exist then existing list of indexes is permuted, otherwise a new set of permuted indexes are generated

func (*IndexView) RowsByString

func (ix *IndexView) RowsByString(column string, str string, contains, ignoreCase bool) []int

RowsByString returns the list of *our indexes* whose row in the table has given string value in given column name (de-reference our indexes to get actual row). if contains, only checks if row contains string; if ignoreCase, ignores case. returns nil if name invalid -- see also Try. Use named args for greater clarity.

func (*IndexView) RowsByStringIndex

func (ix *IndexView) RowsByStringIndex(colIndex int, str string, contains, ignoreCase bool) []int

RowsByStringIndex returns the list of *our indexes* whose row in the table has given string value in given column index (de-reference our indexes to get actual row). if contains, only checks if row contains string; if ignoreCase, ignores case. Use named args for greater clarity.

func (*IndexView) RowsByStringTry

func (ix *IndexView) RowsByStringTry(column string, str string, contains, ignoreCase bool) ([]int, error)

RowsByStringTry returns the list of *our indexes* whose row in the table has given string value in given column name (de-reference our indexes to get actual row). if contains, only checks if row contains string; if ignoreCase, ignores case. returns error message for invalid column name. Use named args for greater clarity.

func (*IndexView) SaveCSV

func (ix *IndexView) SaveCSV(filename core.Filename, delim Delims, headers bool) error

SaveCSV writes a table index view to a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg). If headers = true then generate column headers that capture the type and tensor cell geometry of the columns, enabling full reloading of exactly the same table format and data (recommended). Otherwise, only the data is written.

func (*IndexView) Sequential

func (ix *IndexView) Sequential()

Sequential sets indexes to sequential row-wise indexes into table

func (*IndexView) SetTable

func (ix *IndexView) SetTable(et *Table)

SetTable sets as indexes into given table with sequential initial indexes

func (*IndexView) Sort

func (ix *IndexView) Sort(lessFunc func(et *Table, i, j int) bool)

Sort sorts the indexes into our Table using given Less function. The Less function operates directly on row numbers into the Table as these row numbers have already been projected through the indexes.

func (*IndexView) SortColumn

func (ix *IndexView) SortColumn(colIndex int, ascending bool)

SortColumn sorts the indexes into our Table according to values in given column index, using either ascending or descending order. Only valid for 1-dimensional columns.

func (*IndexView) SortColumnName

func (ix *IndexView) SortColumnName(column string, ascending bool) error

SortColumnName sorts the indexes into our Table according to values in given column name, using either ascending or descending order. Only valid for 1-dimensional columns. Returns error if column name not found.

func (*IndexView) SortColumnNames

func (ix *IndexView) SortColumnNames(columns []string, ascending bool) error

SortColumnNames sorts the indexes into our Table according to values in given column names, using either ascending or descending order. Only valid for 1-dimensional columns. Returns error if column name not found.

func (*IndexView) SortColumns

func (ix *IndexView) SortColumns(colIndexes []int, ascending bool)

SortColumns sorts the indexes into our Table according to values in given list of column indexes, using either ascending or descending order for all of the columns. Only valid for 1-dimensional columns.

func (*IndexView) SortIndexes

func (ix *IndexView) SortIndexes()

SortIndexes sorts the indexes into our Table directly in numerical order, producing the native ordering, while preserving any filtering that might have occurred.

func (*IndexView) SortStable

func (ix *IndexView) SortStable(lessFunc func(et *Table, i, j int) bool)

SortStable stably sorts the indexes into our Table using given Less function. The Less function operates directly on row numbers into the Table as these row numbers have already been projected through the indexes. It is *essential* that it always returns false when the two are equal for the stable function to actually work.

func (*IndexView) SortStableColumn

func (ix *IndexView) SortStableColumn(colIndex int, ascending bool)

SortStableColumn sorts the indexes into our Table according to values in given column index, using either ascending or descending order. Only valid for 1-dimensional columns.

func (*IndexView) SortStableColumnName

func (ix *IndexView) SortStableColumnName(column string, ascending bool) error

SortStableColumnName sorts the indexes into our Table according to values in given column name, using either ascending or descending order. Only valid for 1-dimensional columns. Returns error if column name not found.

func (*IndexView) SortStableColumnNames

func (ix *IndexView) SortStableColumnNames(columns []string, ascending bool) error

SortStableColumnNames sorts the indexes into our Table according to values in given column names, using either ascending or descending order. Only valid for 1-dimensional columns. Returns error if column name not found.

func (*IndexView) SortStableColumns

func (ix *IndexView) SortStableColumns(colIndexes []int, ascending bool)

SortStableColumns sorts the indexes into our Table according to values in given list of column indexes, using either ascending or descending order for all of the columns. Only valid for 1-dimensional columns.

func (*IndexView) Swap

func (ix *IndexView) Swap(i, j int)

Swap switches the indexes for i and j

func (*IndexView) WriteCSV

func (ix *IndexView) WriteCSV(w io.Writer, delim Delims, headers bool) error

WriteCSV writes only rows in table idx view to a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg). If headers = true then generate column headers that capture the type and tensor cell geometry of the columns, enabling full reloading of exactly the same table format and data (recommended). Otherwise, only the data is written.

type LessFunc

type LessFunc func(et *Table, i, j int) bool

LessFunc is a function used for sort comparisons that returns true if Table row i is less than Table row j -- these are the raw row numbers, which have already been projected through indexes when used for sorting via Indexes.

type SplitAgg

type SplitAgg struct {

	// the name of the aggregation operation performed, e.g., Sum, Mean, etc
	Name string

	// column index on which the aggregation was performed -- results will have same shape as cells in this column
	ColumnIndex int

	// aggregation results -- outer index is length of splits, inner is the length of the cell shape for the column
	Aggs [][]float64
}

SplitAgg contains aggregation results for splits

func (*SplitAgg) Clone

func (sa *SplitAgg) Clone() *SplitAgg

Clone returns a cloned copy of our SplitAgg

func (*SplitAgg) CopyFrom

func (sa *SplitAgg) CopyFrom(osa *SplitAgg)

CopyFrom copies from other SplitAgg -- we get our own unique copy of everything

type Splits

type Splits struct {

	// the list of index views for each split
	Splits []*IndexView

	// levels of indexes used to organize the splits -- each split contains the full outer product across these index levels.  for example, if the split was generated by grouping over column values, then these are the column names in order of grouping.  the splits are not automatically sorted hierarchically by these levels but e.g., the GroupBy method produces that result -- use the Sort methods to explicitly sort.
	Levels []string

	// the values of the index levels associated with each split.  The outer dimension is the same length as Splits, and the inner dimension is the levels.
	Values [][]string

	// aggregate results, one for each aggregation operation performed -- split-level data is contained within each SplitAgg struct -- deleting a split removes these aggs but adding new splits just invalidates all existing aggs (they are automatically deleted).
	Aggs []*SplitAgg
	// contains filtered or unexported fields
}

Splits is a list of indexed views into a given Table, that represent a particular way of splitting up the data, e.g., whenever a given column value changes.

It is functionally equivalent to the MultiIndex in python's pandas: it has multiple levels of indexes as listed in the Levels field, which then have corresponding Values for each split. These index levels can be re-ordered, and new Splits or IndexViews's can be created from subsets of the existing levels. The Values are stored simply as string values, as this is the most general type and often index values are labels etc.

For Splits created by the splits.GroupBy function for example, each index Level is the column name that the data was grouped by, and the Values for each split are then the values of those columns. However, any arbitrary set of levels and values can be used, e.g., as in the splits.GroupByFunc function.

Conceptually, a given Split always contains the full "outer product" of all the index levels -- there is one split for each unique combination of values along each index level. Thus, removing one level collapses across those values and moves the corresponding indexes into the remaining split indexes.

You can Sort and Filter based on the index values directly, to reorganize the splits and drop particular index values, etc.

Splits also maintains Aggs aggregate values for each split, which can be computed using standard aggregation methods over data columns, using the split.Agg* functions.

The table code contains the structural methods for managing the Splits data. See split package for end-user methods to generate different kinds of splits, and perform aggregations, etc.

func (*Splits) AddAgg

func (spl *Splits) AddAgg(name string, colIndex int) *SplitAgg

AddAgg adds a new set of aggregation results for the Splits

func (*Splits) AggByColumnName

func (spl *Splits) AggByColumnName(name string) *SplitAgg

AggByColumnName returns Agg results for given column name, optionally including :Name agg name appended, where Name is the name given to the Agg result (e.g., Mean for a standard Mean agg). Returns nil if not found. See also Try version for error message.

func (*Splits) AggByColumnNameTry

func (spl *Splits) AggByColumnNameTry(name string) (*SplitAgg, error)

AggByColumnNameTry returns Agg results for given column name, optionally including :Name agg name appended, where Name is the name given to the Agg result (e.g., Mean for a standard Mean agg). Returns error message if not found.

func (*Splits) AggByName

func (spl *Splits) AggByName(name string) *SplitAgg

AggByName returns Agg results for given name, which does NOT include the column name, just the name given to the Agg result (e.g., Mean for a standard Mean agg). See also AggByColumnName. Returns nil if not found. See also Try version for error message.

func (*Splits) AggByNameTry

func (spl *Splits) AggByNameTry(name string) (*SplitAgg, error)

AggByNameTry returns Agg results for given name, which does NOT include the column name, just the name given to the Agg result (e.g., Mean for a standard Mean agg). See also AggByColumnName. Returns error message if not found.

func (*Splits) AggsToTable

func (spl *Splits) AggsToTable(colName bool) *Table

AggsToTable returns a Table containing this Splits' aggregate data. Must have Levels and Aggs all created as in the split.Agg* methods. if colName == ColumnNameOnly, then the name of the columns for the Table is just the corresponding agg column name -- otherwise it also includes the name of the aggregation function with a : divider (e.g., Name:Mean)

func (*Splits) AggsToTableCopy

func (spl *Splits) AggsToTableCopy(colName bool) *Table

AggsToTableCopy returns a Table containing this Splits' aggregate data and a copy of the first row of data for each split for all non-agg cols, which is useful for recording other data that goes along with aggregated values. Must have Levels and Aggs all created as in the split.Agg* methods. if colName == ColumnNameOnly, then the name of the columns for the Table is just the corresponding agg column name -- otherwise it also includes the name of the aggregation function with a : divider (e.g., Name:Mean)

func (*Splits) ByValue

func (spl *Splits) ByValue(values []string) []int

ByValue finds split indexes by matching to split values, returns nil if not found. values are used in order as far as they go and any remaining values are assumed to match, and any empty values will match anything. Can use this to access different subgroups within overall set of splits.

func (*Splits) Clone

func (spl *Splits) Clone() *Splits

Clone returns a cloned copy of our splits

func (*Splits) CopyFrom

func (spl *Splits) CopyFrom(osp *Splits)

CopyFrom copies from other Splits -- we get our own unique copy of everything

func (*Splits) Delete

func (spl *Splits) Delete(idx int)

Delete deletes split at given index -- use this to coordinate deletion of Splits, Values, and Aggs values for given split

func (*Splits) DeleteAggs

func (spl *Splits) DeleteAggs()

DeleteAggs deletes all existing aggregation data

func (*Splits) ExtractLevels

func (spl *Splits) ExtractLevels(levels []int) (*Splits, error)

ExtractLevels returns a new Splits that only has the given levels of indexes, in their given order, with the other levels removed and their corresponding indexes merged into the appropriate remaining levels. Any existing aggregation data is not retained in the new splits.

func (*Splits) Filter

func (spl *Splits) Filter(fun func(idx int) bool)

Filter removes any split for which given function returns false

func (*Splits) Len

func (spl *Splits) Len() int

Len returns number of splits

func (*Splits) Less

func (spl *Splits) Less(i, j int) bool

Less calls the LessFunc for sorting

func (*Splits) New

func (spl *Splits) New(dt *Table, values []string, rows ...int) *IndexView

New adds a new split to the list for given table, and with associated values, which are copied before saving into Values list, and any number of rows from the table associated with this split (also copied). Any existing Aggs are deleted by this.

func (*Splits) ReorderLevels

func (spl *Splits) ReorderLevels(order []int) error

ReorderLevels re-orders the index levels according to the given new ordering indexes e.g., []int{1,0} will move the current level 0 to level 1, and 1 to level 0 no checking is done to ensure these are sensible beyond basic length test -- behavior undefined if so. Typically you want to call SortLevels after this.

func (*Splits) SetLevels

func (spl *Splits) SetLevels(levels ...string)

SetLevels sets the Levels index names -- must match actual index dimensionality of the Values. This is automatically done by e.g., GroupBy, but must be done manually if creating custom indexes.

func (*Splits) Sort

func (spl *Splits) Sort(lessFunc func(spl *Splits, i, j int) bool)

Sort sorts the splits according to the given Less function.

func (*Splits) SortLevels

func (spl *Splits) SortLevels()

SortLevels sorts the splits according to the current index level ordering of values i.e., first index level is outer sort dimension, then within that is the next, etc

func (*Splits) SortOrder

func (spl *Splits) SortOrder(order []int) error

SortOrder sorts the splits according to the given ordering of index levels which can be a subset as well

func (*Splits) Swap

func (spl *Splits) Swap(i, j int)

Swap switches the indexes for i and j

func (*Splits) Table

func (spl *Splits) Table() *Table

Table returns the table from the first split (should be same for all) returns nil if no splits yet

type SplitsLessFunc

type SplitsLessFunc func(spl *Splits, i, j int) bool

SplitsLessFunc is a function used for sort comparisons that returns true if split i is less than split j

type Table

type Table struct {

	// columns of data, as tensor.Tensor tensors
	Columns []tensor.Tensor `view:"no-inline"`

	// the names of the columns
	ColumnNames []string

	// number of rows, which is enforced to be the size of the outer-most dimension of the column tensors
	Rows int `edit:"-"`

	// the map of column names to column numbers
	ColumnNameMap map[string]int `view:"-"`

	// misc meta data for the table.  We use lower-case key names following the struct tag convention:  name = name of table; desc = description; read-only = gui is read-only; precision = n for precision to write out floats in csv.  For Column-specific data, we look for ColumnName: prefix, specifically ColumnName:desc = description of the column contents, which is shown as tooltip in the tensorview.TableView, and :width for width of a column
	MetaData map[string]string
}

Table is a table of data, with columns of tensors, each with the same number of Rows (outer-most dimension).

func NewSliceTable

func NewSliceTable(st any) (*Table, error)

NewSliceTable returns a new Table with data from the given slice of structs.

func NewTable

func NewTable(rows int, name ...string) *Table

func (*Table) AddColumn

func (dt *Table) AddColumn(tsr tensor.Tensor, name string) error

AddColumn adds the given tensor as a column to the table, returning an error and not adding if the name is not unique. Automatically adjusts the shape to fit the current number of rows.

func (*Table) AddColumnOfType

func (dt *Table) AddColumnOfType(typ reflect.Kind, name string) tensor.Tensor

AddColumnOfType adds a new scalar column to the table, of given reflect type, column name (which must be unique), The cells of this column hold a single (scalar) value of given type. Supported types are string, bool (for tensor.Bits), float32, float64, int, int32, and byte.

func (*Table) AddFloat32Column

func (dt *Table) AddFloat32Column(name string) tensor.Tensor

AddFloat32Column adds a new float32 column with given name. The cells of this column hold a single scalar value.

func (*Table) AddFloat32TensorColumn

func (dt *Table) AddFloat32TensorColumn(name string, cellSizes []int, dimNames ...string) tensor.Tensor

AddFloat32TensorColumn adds a new n-dimensional float32 column with given name and dimensionality of each _cell_. An outer-most Row dimension will be added to this dimensionality to create the tensor column.

func (*Table) AddFloat64Column

func (dt *Table) AddFloat64Column(name string) tensor.Tensor

AddFloat64Column adds a new float64 column with given name. The cells of this column hold a single scalar value.

func (*Table) AddFloat64TensorColumn

func (dt *Table) AddFloat64TensorColumn(name string, cellSizes []int, dimNames ...string) tensor.Tensor

AddFloat64TensorColumn adds a new n-dimensional float64 column with given name and dimensionality of each _cell_. An outer-most Row dimension will be added to this dimensionality to create the tensor column.

func (*Table) AddIntColumn

func (dt *Table) AddIntColumn(name string) tensor.Tensor

AddIntColumn adds a new int column with given name. The cells of this column hold a single scalar value.

func (*Table) AddIntTensorColumn

func (dt *Table) AddIntTensorColumn(name string, cellSizes []int, dimNames ...string) tensor.Tensor

AddIntTensorColumn adds a new n-dimensional int column with given name and dimensionality of each _cell_. An outer-most Row dimension will be added to this dimensionality to create the tensor column.

func (*Table) AddRows

func (dt *Table) AddRows(n int)

AddRows adds n rows to each of the columns

func (*Table) AddStringColumn

func (dt *Table) AddStringColumn(name string) tensor.Tensor

AddStringColumn adds a new String column with given name. The cells of this column hold a single string value.

func (*Table) AddTensorColumnOfType

func (dt *Table) AddTensorColumnOfType(typ reflect.Kind, name string, cellSizes []int, dimNames ...string) tensor.Tensor

AddTensorColumnOfType adds a new n-dimensional column to the table, of given reflect type, column name (which must be unique), and dimensionality of each _cell_. An outer-most Row dimension will be added to this dimensionality to create the tensor column. Supported types are string, bool (for tensor.Bits), float32, float64, int, int32, and byte.

func (*Table) AppendRows

func (dt *Table) AppendRows(dt2 *Table)

AppendRows appends shared columns in both tables with input table rows

func (*Table) Clone

func (dt *Table) Clone() *Table

Clone returns a complete copy of this table

func (*Table) Column

func (dt *Table) Column(i int) tensor.Tensor

Column returns the tensor at given column index

func (*Table) ColumnByName

func (dt *Table) ColumnByName(name string) tensor.Tensor

ColumnByName returns the tensor at given column name without any error messages. Returns nil if not found

func (*Table) ColumnByNameTry

func (dt *Table) ColumnByNameTry(name string) (tensor.Tensor, error)

ColumnByNameTry returns the tensor at given column name, with error message if not found. Returns nil if not found

func (*Table) ColumnIndex

func (dt *Table) ColumnIndex(name string) int

ColumnIndex returns the index of the given column name. returns -1 if name not found -- see Try version for error message.

func (*Table) ColumnIndexTry

func (dt *Table) ColumnIndexTry(name string) (int, error)

ColumnIndexTry returns the index of the given column name, along with an error if not found.

func (*Table) ColumnIndexesByNames

func (dt *Table) ColumnIndexesByNames(names []string) []int

ColumnIndexesByNames returns the indexes of the given column names. idxs have -1 if name not found -- see Try version for error message.

func (*Table) ColumnName

func (dt *Table) ColumnName(i int) string

ColumnName returns the name of given column

func (*Table) CopyCell

func (dt *Table) CopyCell(column string, row int, cpt *Table, cpColNm string, cpRow int) bool

CopyCell copies into cell at given column, row from cell in other table. It is robust to differences in type; uses destination cell type. Returns error if column names are invalid.

func (*Table) CopyMetaDataFrom

func (dt *Table) CopyMetaDataFrom(cp *Table)

CopyMetaDataFrom copies meta data from other table

func (*Table) DeleteAll

func (dt *Table) DeleteAll()

DeleteAll deletes all columns -- full reset

func (*Table) DeleteColumnIndex

func (dt *Table) DeleteColumnIndex(idx int)

DeleteColumnIndex deletes column of given index

func (*Table) DeleteColumnName

func (dt *Table) DeleteColumnName(name string) error

DeleteColumnName deletes column of given name. returns error if not found.

func (*Table) Float

func (dt *Table) Float(column string, row int) float64

Float returns the float64 value of cell at given column (by name), row index for columns that have 1-dimensional tensors. Returns NaN if column is not a 1-dimensional tensor or col name not found, or row not valid.

func (*Table) FloatIndex

func (dt *Table) FloatIndex(column, row int) float64

FloatIndex returns the float64 value of cell at given column, row index for columns that have 1-dimensional tensors. Returns NaN if column is not a 1-dimensional tensor or row not valid.

func (*Table) IsValidRow

func (dt *Table) IsValidRow(row int) bool

IsValidRow returns true if the row is valid

func (*Table) NumColumns

func (dt *Table) NumColumns() int

NumColumns returns the number of columns

func (*Table) NumRows

func (dt *Table) NumRows() int

NumRows returns the number of rows

func (*Table) OpenCSV

func (dt *Table) OpenCSV(filename core.Filename, delim Delims) error

OpenCSV reads a table from a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg), using the Go standard encoding/csv reader conforming to the official CSV standard. If the table does not currently have any columns, the first row of the file is assumed to be headers, and columns are constructed therefrom. If the file was saved from table with headers, then these have full configuration information for tensor type and dimensionality. If the table DOES have existing columns, then those are used robustly for whatever information fits from each row of the file.

func (*Table) OpenFS

func (dt *Table) OpenFS(fsys fs.FS, filename string, delim Delims) error

OpenFS is the version of Table.OpenCSV that uses an fs.FS filesystem.

func (*Table) ReadCSV

func (dt *Table) ReadCSV(r io.Reader, delim Delims) error

ReadCSV reads a table from a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg), using the Go standard encoding/csv reader conforming to the official CSV standard. If the table does not currently have any columns, the first row of the file is assumed to be headers, and columns are constructed therefrom. If the file was saved from table with headers, then these have full configuration information for tensor type and dimensionality. If the table DOES have existing columns, then those are used robustly for whatever information fits from each row of the file.

func (*Table) ReadCSVRow

func (dt *Table) ReadCSVRow(rec []string, row int)

ReadCSVRow reads a record of CSV data into given row in table

func (*Table) RowsByString

func (dt *Table) RowsByString(column string, str string, contains, ignoreCase bool) []int

RowsByString returns the list of rows that have given string value in given column name. returns nil if name invalid -- see also Try. if contains, only checks if row contains string; if ignoreCase, ignores case. Use named args for greater clarity.

func (*Table) RowsByStringIndex

func (dt *Table) RowsByStringIndex(column int, str string, contains, ignoreCase bool) []int

RowsByStringIndex returns the list of rows that have given string value in given column index. if contains, only checks if row contains string; if ignoreCase, ignores case. Use named args for greater clarity.

func (*Table) SaveCSV

func (dt *Table) SaveCSV(filename core.Filename, delim Delims, headers bool) error

SaveCSV writes a table to a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg). If headers = true then generate column headers that capture the type and tensor cell geometry of the columns, enabling full reloading of exactly the same table format and data (recommended). Otherwise, only the data is written.

func (*Table) SetFloat

func (dt *Table) SetFloat(column string, row int, val float64) bool

SetFloat sets the float64 value of cell at given column (by name), row index for columns that have 1-dimensional tensors.

func (*Table) SetFloatIndex

func (dt *Table) SetFloatIndex(column, row int, val float64) bool

SetFloatIndex sets the float64 value of cell at given column, row index for columns that have 1-dimensional tensors. Returns true if set.

func (*Table) SetMetaData

func (dt *Table) SetMetaData(key, val string)

SetMetaData sets given meta-data key to given value, safely creating the map if not yet initialized. Standard Keys are: * name -- name of table * desc -- description of table * read-only -- makes gui read-only (inactive edits) for tensorview.TableView * ColumnName:* -- prefix for all column-specific meta-data

  • desc -- description of column

func (*Table) SetNumRows

func (dt *Table) SetNumRows(rows int)

SetNumRows sets the number of rows in the table, across all columns if rows = 0 then effective number of rows in tensors is 1, as this dim cannot be 0

func (*Table) SetString

func (dt *Table) SetString(column string, row int, val string) bool

SetString sets the string value of cell at given column (by name), row index for columns that have 1-dimensional tensors. Returns true if set.

func (*Table) SetStringIndex

func (dt *Table) SetStringIndex(column, row int, val string) bool

SetStringIndex sets the string value of cell at given column, row index for columns that have 1-dimensional tensors. Returns true if set.

func (*Table) SetTensor

func (dt *Table) SetTensor(column string, row int, val tensor.Tensor) bool

SetTensor sets the tensor value of cell at given column (by name), row index for columns that have n-dimensional tensors. Returns true if set.

func (*Table) SetTensorFloat1D

func (dt *Table) SetTensorFloat1D(column string, row int, idx int, val float64) bool

SetTensorFloat1D sets the tensor cell's float cell value at given 1D index within cell, at given column (by name), row index for columns that have n-dimensional tensors. Returns true if set.

func (*Table) SetTensorIndex

func (dt *Table) SetTensorIndex(column, row int, val tensor.Tensor) bool

SetTensorIndex sets the tensor value of cell at given column, row index for columns that have n-dimensional tensors. Returns true if set.

func (*Table) StringIndex

func (dt *Table) StringIndex(column, row int) string

StringIndex returns the string value of cell at given column, row index for columns that have 1-dimensional tensors. Returns "" if column is not a 1-dimensional tensor or row not valid.

func (*Table) StringValue

func (dt *Table) StringValue(column string, row int) string

StringValue returns the string value of cell at given column (by name), row index for columns that have 1-dimensional tensors. Returns "" if column is not a 1-dimensional tensor or row not valid.

func (*Table) TableHeaders

func (dt *Table) TableHeaders() []string

TableHeaders generates special header strings from the table with full information about type and tensor cell dimensionality.

func (*Table) Tensor

func (dt *Table) Tensor(column string, row int) tensor.Tensor

Tensor returns the tensor SubSpace for given column (by name), row index for columns that have higher-dimensional tensors so each row is represented by an n-1 dimensional tensor, with the outer dimension being the row number. Returns nil on any error -- see Try version for error returns.

func (*Table) TensorFloat1D

func (dt *Table) TensorFloat1D(column string, row int, idx int) float64

TensorFloat1D returns the float value of a Tensor cell's cell at given 1D offset within cell, for given column (by name), row index for columns that have higher-dimensional tensors so each row is represented by an n-1 dimensional tensor, with the outer dimension being the row number. Returns 0 on any error -- see Try version for error returns.

func (*Table) TensorIndex

func (dt *Table) TensorIndex(column, row int) tensor.Tensor

TensorIndex returns the tensor SubSpace for given column, row index for columns that have higher-dimensional tensors so each row is represented by an n-1 dimensional tensor, with the outer dimension being the row number. Returns nil if column is a 1-dimensional tensor or there is any error from the tensor.Tensor.SubSpace call.

func (*Table) UpdateColumnNameMap

func (dt *Table) UpdateColumnNameMap() error

UpdateColumnNameMap updates the column name map, returning an error if any of the column names are duplicates.

func (*Table) WriteCSV

func (dt *Table) WriteCSV(w io.Writer, delim Delims, headers bool) error

WriteCSV writes a table to a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg). If headers = true then generate column headers that capture the type and tensor cell geometry of the columns, enabling full reloading of exactly the same table format and data (recommended). Otherwise, only the data is written.

func (*Table) WriteCSVHeaders

func (dt *Table) WriteCSVHeaders(w io.Writer, delim Delims) (int, error)

WriteCSVHeaders writes headers to a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg). Returns number of columns in header

func (*Table) WriteCSVRow

func (dt *Table) WriteCSVRow(w io.Writer, row int, delim Delims) error

WriteCSVRow writes given row to a comma-separated-values (CSV) file (where comma = any delimiter, specified in the delim arg)

func (*Table) WriteCSVRowWriter

func (dt *Table) WriteCSVRowWriter(cw *csv.Writer, row int, ncol int) error

WriteCSVRowWriter uses csv.Writer to write one row

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL