qframe

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2019 License: MIT Imports: 30 Imported by: 16

README

Build Status gocover.run Go Report Card GoDoc

QFrame is an immutable data frame that support filtering, aggregation and data manipulation. Any operation on a QFrame results in a new QFrame, the original QFrame remains unchanged. This can be done fairly efficiently since much of the underlying data will be shared between the two frames.

The design of QFrame has mainly been driven by the requirements from qocache but it is in many aspects a general purpose data frame. Any suggestions for added/improved functionality to support a wider scope is always of interest as long as they don't conflict with the requirements from qocache! See Contribute.

Installation

go get github.com/tobgu/qframe

Usage

Below are some examples of common use cases. The list is not exhaustive in any way. For a complete description of all operations including more examples see the docs.

IO

QFrames can currently be read from and written to CSV, record oriented JSON, and any SQL database supported by the go database/sql driver.

CSV Data

Read CSV data:

input := `COL1,COL2
a,1.5
b,2.25
c,3.0`

f := qframe.ReadCSV(strings.NewReader(input))
fmt.Println(f)

Output:

COL1(s) COL2(f)
------- -------
      a     1.5
      b    2.25
      c       3

Dims = 2 x 3
SQL Data

QFrame supports reading and writing data from the standard library database/sql drivers. It has been tested with SQLite, Postgres, and MariaDB.

SQLite Example

Load data to and from an in-memory SQLite database. Note that this example requires you to have go-sqlite3 installed prior to running.

package main

import (
	"database/sql"
	"fmt"

	_ "github.com/mattn/go-sqlite3"
	"github.com/tobgu/qframe"
	qsql "github.com/tobgu/qframe/config/sql"
)

func main() {
	// Create a new in-memory SQLite database.
	db, _ := sql.Open("sqlite3", ":memory:")
	// Add a new table.
	db.Exec(`
	CREATE TABLE test (
		COL1 INT,
		COL2 REAL,
		COL3 TEXT,
		COL4 BOOL
	);`)
	// Create a new QFrame to populate our table with.
	qf := qframe.New(map[string]interface{}{
		"COL1": []int{1, 2, 3},
		"COL2": []float64{1.1, 2.2, 3.3},
		"COL3": []string{"one", "two", "three"},
		"COL4": []bool{true, true, true},
	})
	fmt.Println(qf)
	// Start a new SQL Transaction.
	tx, _ := db.Begin()
	// Write the QFrame to the database.
	qf.ToSQL(tx,
		// Write only to the test table
		qsql.Table("test"),
		// Explicitly set SQLite compatibility.
		qsql.SQLite(),
	)
	// Create a new QFrame from SQL.
	newQf := qframe.ReadSQL(tx,
		// A query must return at least one column. In this 
		// case it will return all of the columns we created above.
		qsql.Query("SELECT * FROM test"),
		// SQLite stores boolean values as integers, so we
		// can coerce them back to bools with the CoercePair option.
		qsql.Coerce(qsql.CoercePair{Column: "COL4", Type: qsql.Int64ToBool}),
		qsql.SQLite(),
	)
	fmt.Println(newQf)
	fmt.Println(newQf.Equals(qf))
}

Output:

COL1(i) COL2(f) COL3(s) COL4(b)
------- ------- ------- -------
      1     1.1     one    true
      2     2.2     two    true
      3     3.3   three    true

Dims = 4 x 3
true 

Filtering

Filtering can be done either by applying individual filters to the QFrame or by combining filters using AND and OR.

Filter with OR-clause:

f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
newF := f.Filter(qframe.Or(
    qframe.Filter{Column: "COL1", Comparator: ">", Arg: 2},
    qframe.Filter{Column: "COL2", Comparator: "=", Arg: "a"}))
fmt.Println(newF)

Output:

COL1(i) COL2(s)
------- -------
      1       a
      3       c

Dims = 2 x 2

Grouping and aggregation

Grouping and aggregation is done in two distinct steps. The function used in the aggregation step takes a slice of elements and returns an element. For floats this function signature matches many of the statistical functions in Gonum, these can hence be applied directly.

intSum := func(xx []int) int {
    result := 0
    for _, x := range xx {
        result += x
    }
    return result
}

f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 2, 3, 3}, "COL2": []string{"a", "b", "c", "a", "b"}})
f = f.GroupBy(groupby.Columns("COL2")).Aggregate(qframe.Aggregation{Fn: intSum, Column: "COL1"})
fmt.Println(f.Sort(qframe.Order{Column: "COL2"}))

Output:

COL2(s) COL1(i)
------- -------
      a       4
      b       5
      c       2

Dims = 2 x 3

Data manipulation

There are two different functions by which data can be manipulated, Apply and Eval. Eval is slightly more high level and takes a more data driven approach but basically boils down to a bunch of Apply in the end.

Example using Apply to string concatenate two columns:

f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
f = f.Apply(
    qframe.Instruction{Fn: function.StrI, DstCol: "COL1", SrcCol1: "COL1"},
    qframe.Instruction{Fn: function.ConcatS, DstCol: "COL3", SrcCol1: "COL1", SrcCol2: "COL2"})
fmt.Println(f.Select("COL3"))

Output:

COL3(s)
-------
     1a
     2b
     3c

Dims = 1 x 3

The same example using Eval instead:

f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
f = f.Eval("COL3", qframe.Expr("+", qframe.Expr("str", types.ColumnName("COL1")), types.ColumnName("COL2")))
fmt.Println(f.Select("COL3"))

More usage examples

Examples of the most common operations are available in the docs.

Error handling

All operations that may result in errors will set the Err variable on the returned QFrame to indicate that an error occurred. The presence of an error on the QFrame will prevent any future operations from being executed on the frame (eg. it follows a monad-like pattern). This allows for smooth chaining of multiple operations without having to explicitly check errors between each operation.

Configuration parameters

API functions that require configuration parameters make use of functional options to allow more options to be easily added in the future in a backwards compatible way.

Design goals

  • Performance
    • Speed should be on par with, or better than, Python Pandas for corresponding operations.
    • No or very little memory overhead per data element.
    • Performance impact of operations should be straight forward to reason about.
  • API
    • Should be reasonably small and low ceremony.
    • Should allow custom, user provided, functions to be used for data processing
    • Should provide built in functions for most common operations

High level design

A QFrame is a collection of columns which can be of type int, float, string, bool or enum. For more information about the data types see the types docs.

In addition to the columns there is also an index which controls which rows in the columns that are part of the QFrame and the sort order of these columns. Many operations on QFrames only affect the index, the underlying data remains the same.

Many functions and methods in qframe take the empty interface as parameter, for functions to be applied or string references to internal functions for example. These always correspond to a union/sum type with a fixed set of valid types that are checked in runtime through type switches (there's hardly any reflection applied in QFrame for performance reasons). Which types are valid depends on the function called and the column type that is affected. Modelling this statically is hard/impossible in Go, hence the dynamic approach. If you plan to use QFrame with datasets with fixed layout and types it should be a small task to write tiny wrappers for the types you are using to regain static type safety.

Limitations

  • The API can still not be considered stable.
  • The maximum number of rows in a QFrame is 4294967296 (2^32).
  • The CSV parser only handles ASCII characters as separators.
  • Individual strings cannot be longer than 268 Mb (2^28 byte).
  • A string column cannot contain more than a total of 34 Gb (2^35 byte).
  • At the moment you cannot rely on any of the errors returned to fulfill anything else than the Error interface. In the future this will hopefully be improved to provide more help in identifying the root cause of errors.

Performance/benchmarks

There are a number of benchmarks in qbench comparing QFrame to Pandas and Gota where applicable.

Other data frames

The work on QFrame has been inspired by Python Pandas and Gota.

Contribute

Want to contribute? Great! Open an issue on Github and let the discussions begin! Below are some instructions for working with the QFrame repo.

Ideas for further work

Below are some ideas of areas where contributions would be welcome.

  • Support for more input and output formats.
  • Support for additional column formats.
  • Support for using the Arrow format for columns.
  • General CPU and memory optimizations.
  • Improve documentation.
  • More analytical functionality.
  • Dataset joins.
  • Improved interoperability with other libraries in the Go data science eco system.
  • Improve string representation of QFrames.

Install dependencies

make dev-deps

Tests

Please contribute tests together with any code. The tests should be written against the public API to avoid lockdown of the implementation and internal structure which would make it more difficult to change in the future.

Run tests: make test

This will also trigger code to be regenerated.

Code generation

The codebase contains some generated code to reduce the amount of duplication required for similar functionality across different column types. Generated code is recognized by file names ending with _gen.go. These files must never be edited directly.

To trigger code generation: make generate

Documentation

Overview

Package qframe holds the main QFrame implementation and acts as an entry point to QFrame.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Doc

func Doc() string

Doc returns a generated documentation string that states which built in filters, aggregations and transformations that exist for each column type.

Types

type Aggregation

type Aggregation struct {
	// Fn is the aggregation function to apply.
	//
	// IMPORTANT: For pointer and reference types you must not assume that the data passed argument
	// to this function is valid after the function returns. If you plan to keep it around you need
	// to take a copy of the data.
	Fn types.SliceFuncOrBuiltInId

	// Column is the name of the column to apply the aggregation to.
	Column string
}

Aggregation represents a function to apply to a column.

type AndClause

type AndClause comboClause

AndClause represents the logical conjunction of multiple clauses.

func And

func And(clauses ...FilterClause) AndClause

And returns a new AndClause that represents the conjunction of the passed filter clauses.

func (AndClause) Err

func (c AndClause) Err() error

Err returns any error that may have occurred during creation of the filter

func (AndClause) String

func (c AndClause) String() string

String returns a textual description of the filter.

type BoolView

type BoolView struct {
	bcolumn.View
}

BoolView provides a "view" into an bool column and can be used for access to individual elements.

type ConstBool

type ConstBool struct {
	Val   bool
	Count int
}

ConstBool describes a string column with only one value. It can be used during during construction of new QFrames.

type ConstFloat

type ConstFloat struct {
	Val   float64
	Count int
}

ConstFloat describes a string column with only one value. It can be used during during construction of new QFrames.

type ConstInt

type ConstInt struct {
	Val   int
	Count int
}

ConstInt describes a string column with only one value. It can be used during during construction of new QFrames.

type ConstString

type ConstString struct {
	Val   *string
	Count int
}

ConstString describes a string column with only one value. It can be used during during construction of new QFrames.

type EnumView

type EnumView struct {
	ecolumn.View
}

EnumView provides a "view" into an enum column and can be used for access to individual elements.

type Expression

type Expression interface {

	// Err returns an error if the expression could not be constructed for some reason.
	Err() error
	// contains filtered or unexported methods
}

Expression is an internal interface representing an expression that can be executed on a QFrame.

func Expr

func Expr(name string, args ...interface{}) Expression

Expr represents an expression with one or more arguments. The arguments may be values, columns or the result of other expressions.

If more arguments than two are passed, the expression will be evaluated by repeatedly applying the function to pairwise elements from the left. Temporary columns will be created as necessary to hold intermediate results.

Pseudo example:

["/", 18, 2, 3] is evaluated as ["/", ["/", 18, 2], 3] (= 3)

func Val

func Val(value interface{}) Expression

Val represents a constant or column.

type Filter

type Filter filter.Filter

Filter is the lowest level in a filter clause. See the docs for filter.Filter for an in depth description of the fields.

func (Filter) Err

func (c Filter) Err() error

Err returns any error that may have occurred during creation of the filter

func (Filter) String

func (c Filter) String() string

String returns a textual description of the filter.

type FilterClause

type FilterClause interface {
	fmt.Stringer

	Err() error
	// contains filtered or unexported methods
}

FilterClause is an internal interface representing a filter of some kind that can be applied on a QFrame.

type FloatView

type FloatView struct {
	fcolumn.View
}

FloatView provides a "view" into an float column and can be used for access to individual elements.

type GroupStats

type GroupStats grouper.GroupStats

GroupStats contains internal statistics for grouping. Clients should not depend on this for any type of decision making. It is strictly "for info". The layout may change if the underlying grouping mechanisms change.

type Grouper

type Grouper struct {
	Err   error
	Stats GroupStats
	// contains filtered or unexported fields
}

Grouper contains groups of rows produced by the QFrame.GroupBy function.

func (Grouper) Aggregate

func (g Grouper) Aggregate(aggs ...Aggregation) QFrame

Aggregate applies the given aggregations to all row groups in the Grouper.

Time complexity O(m*n) where m = number of aggregations, n = number of rows.

type Instruction

type Instruction struct {
	// Fn is the function to apply.
	//
	// IMPORTANT: For pointer and reference types you must not assume that the data passed argument
	// to this function is valid after the function returns. If you plan to keep it around you need
	// to take a copy of the data.
	Fn types.DataFuncOrBuiltInId

	// DstCol is the name of the column that the result of applying Fn should be stored in.
	DstCol string

	// SrcCol1 is the first column to take arguments to Fn from.
	// This field is optional and must only be set if Fn takes one or more arguments.
	SrcCol1 string

	// SrcCol2 is the second column to take arguments to Fn from.
	// This field is optional and must only be set if Fn takes two arguments.
	SrcCol2 string
}

Instruction describes an operation that will be applied to a row in the QFrame.

type IntView

type IntView struct {
	icolumn.View
}

IntView provides a "view" into an int column and can be used for access to individual elements.

type NotClause

type NotClause struct {
	// contains filtered or unexported fields
}

NotClause represents the logical inverse of of a filter clause.

func Not

func Not(c FilterClause) NotClause

Not creates a new NotClause that represents the inverse of the passed filter clause.

func (NotClause) Err

func (c NotClause) Err() error

Err returns any error that may have occurred during creation of the filter

func (NotClause) String

func (c NotClause) String() string

String returns a textual description of the filter clause.

type NullClause

type NullClause struct{}

NullClause is a convenience type to simplify clients when no filtering is to be done.

func Null

func Null() NullClause

Null returns a new NullClause

func (NullClause) Err

func (c NullClause) Err() error

Err for NullClause always returns nil.

func (NullClause) String

func (c NullClause) String() string

Err for NullClause always returns an empty string.

type OrClause

type OrClause comboClause

OrClause represents the logical disjunction of multiple clauses.

func Or

func Or(clauses ...FilterClause) OrClause

Or returns a new OrClause that represents the disjunction of the passed filter clauses.

func (OrClause) Err

func (c OrClause) Err() error

Err returns any error that may have occurred during creation of the filter

func (OrClause) String

func (c OrClause) String() string

String returns a textual description of the filter.

type Order

type Order struct {
	// Column is the name of the column to sort by.
	Column string

	// Reverse specifies if sorting should be performed ascending (false, default) or descending (true)
	Reverse bool

	// NullLast specifies if null values should go last (true) or first (false, default) for columns that support null.
	NullLast bool
}

Order is used to specify how sorting should be performed.

type QFrame

type QFrame struct {

	// Err indicates that an error has occurred while running an operation.
	// If Err is set it will prevent any further operations from being executed
	// on the QFrame.
	Err error
	// contains filtered or unexported fields
}

QFrame holds a number of columns together and offers methods for filtering, group+aggregate and data manipulation.

Example (ApplyConstant)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
)

func main() {
	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}})
	f = f.Apply(qframe.Instruction{Fn: 1.5, DstCol: "COL2"})
	fmt.Println(f)

	// COL1(i) COL2(f)
	// ------- -------
	//       1     1.5
	//       2     1.5
	//       3     1.5
	//
	// Dims = 2 x 3
}
Output:

Example (ApplyGenerator)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
)

func main() {
	val := -1
	generator := func() int {
		val++
		return val
	}

	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}})
	f = f.Apply(qframe.Instruction{Fn: generator, DstCol: "COL2"})
	fmt.Println(f)

	// COL1(i) COL2(i)
	// ------- -------
	//       1       0
	//       2       1
	//       3       2
	//
	// Dims = 2 x 3
}
Output:

Example (ApplyStrConcat)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/function"
)

func main() {
	// String concatenating COL2 and COL1.
	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
	f = f.Apply(
		qframe.Instruction{Fn: function.StrI, DstCol: "COL1", SrcCol1: "COL1"},
		qframe.Instruction{Fn: function.ConcatS, DstCol: "COL3", SrcCol1: "COL1", SrcCol2: "COL2"})
	fmt.Println(f.Select("COL3"))

}
Output:

COL3(s)
-------
     1a
     2b
     3c

Dims = 1 x 3
Example (Distinct)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/config/groupby"
)

func main() {
	qf := qframe.New(map[string]interface{}{
		"COL1": []string{"a", "b", "a", "b", "b", "c"},
		"COL2": []int{0, 1, 2, 4, 4, 6},
	})

	qf = qf.Distinct(groupby.Columns("COL1", "COL2")).Sort(qframe.Order{Column: "COL1"}, qframe.Order{Column: "COL2"})

	fmt.Println(qf)

}
Output:

COL1(s) COL2(i)
------- -------
      a       0
      a       2
      b       1
      b       4
      c       6

Dims = 2 x 5
Example (EvalStrConcat)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/types"
)

func main() {
	// Same example as for apply but using Eval instead.
	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
	f = f.Eval("COL3", qframe.Expr("+", qframe.Expr("str", types.ColumnName("COL1")), types.ColumnName("COL2")))
	fmt.Println(f.Select("COL3"))

}
Output:

COL3(s)
-------
     1a
     2b
     3c

Dims = 1 x 3
Example (FilterBuiltin)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
)

func main() {
	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
	newF := f.Filter(qframe.Filter{Column: "COL1", Comparator: ">", Arg: 1})
	fmt.Println(newF)

}
Output:

COL1(i) COL2(s)
------- -------
      2       b
      3       c

Dims = 2 x 2
Example (FilterCustomFunc)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
)

func main() {
	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
	isOdd := func(x int) bool { return x&1 > 0 }
	newF := f.Filter(qframe.Filter{Column: "COL1", Comparator: isOdd})
	fmt.Println(newF)

}
Output:

COL1(i) COL2(s)
------- -------
      1       a
      3       c

Dims = 2 x 2
Example (FilterWithOrClause)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
)

func main() {
	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}, "COL2": []string{"a", "b", "c"}})
	newF := f.Filter(qframe.Or(
		qframe.Filter{Column: "COL1", Comparator: ">", Arg: 2},
		qframe.Filter{Column: "COL2", Comparator: "=", Arg: "a"}))
	fmt.Println(newF)

}
Output:

COL1(i) COL2(s)
------- -------
      1       a
      3       c

Dims = 2 x 2
Example (GroupByAggregate)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/config/groupby"
)

func main() {
	intSum := func(xx []int) int {
		result := 0
		for _, x := range xx {
			result += x
		}
		return result
	}

	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 2, 3, 3}, "COL2": []string{"a", "b", "c", "a", "b"}})
	f = f.GroupBy(groupby.Columns("COL2")).Aggregate(qframe.Aggregation{Fn: intSum, Column: "COL1"})
	fmt.Println(f.Sort(qframe.Order{Column: "COL2"}))

}
Output:

COL2(s) COL1(i)
------- -------
      a       4
      b       5
      c       2

Dims = 2 x 3
Example (GroupByCount)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/config/groupby"
)

func main() {
	qf := qframe.New(map[string]interface{}{
		"COL1": []string{"a", "b", "a", "b", "b", "c"},
		"COL2": []float64{0.1, 0.1, 0.2, 0.4, 0.5, 0.6},
	})

	g := qf.GroupBy(groupby.Columns("COL1"))
	qf = g.Aggregate(qframe.Aggregation{Fn: "count", Column: "COL2"}).Sort(qframe.Order{Column: "COL1"})

	fmt.Println(qf)

}
Output:

COL1(s) COL2(i)
------- -------
      a       2
      b       3
      c       1

Dims = 2 x 3
Example (Iter)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/types"
)

func main() {
	qf := qframe.New(map[string]interface{}{
		"COL1": []string{"a", "b", "c"},
		"COL2": []int{0, 1, 2},
		"COL3": []string{"d", "e", "f"},
		"COL4": []int{3, 4, 5},
	})
	named := qf.ColumnTypeMap()
	for _, col := range qf.ColumnNames() {
		if named[col] == types.Int {
			view := qf.MustIntView(col)
			for i := 0; i < view.Len(); i++ {
				fmt.Println(view.ItemAt(i))
			}
		}
	}

}
Output:

0
1
2
3
4
5
Example (SortWithEnum)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/config/newqf"
)

func main() {
	f := qframe.New(
		map[string]interface{}{"COL1": []string{"abc", "def", "ghi"}, "COL2": []string{"a", "b", "c"}},
		newqf.Enums(map[string][]string{"COL2": {"c", "b", "a"}}))
	fmt.Println(f)
	fmt.Println("\nSorted according to enum spec:")
	fmt.Println(f.Sort(qframe.Order{Column: "COL2"}))
}
Output:

COL1(s) COL2(e)
------- -------
    abc       a
    def       b
    ghi       c

Dims = 2 x 3

Sorted according to enum spec:
COL1(s) COL2(e)
------- -------
    ghi       c
    def       b
    abc       a

Dims = 2 x 3
Example (View)
package main

import (
	"fmt"

	"github.com/tobgu/qframe"
)

func main() {
	f := qframe.New(map[string]interface{}{"COL1": []int{1, 2, 3}})
	v, _ := f.IntView("COL1")
	fmt.Println(v.Slice())

}
Output:

[1 2 3]

func New

func New(data map[string]types.DataSlice, fns ...newqf.ConfigFunc) QFrame

New creates a new QFrame with column content from data.

Time complexity O(m * n) where m = number of columns, n = number of rows.

Example
package main

import (
	"fmt"
	"math"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/config/newqf"
)

func main() {
	a, c := "a", "c"
	f := qframe.New(map[string]interface{}{
		"COL1": []int{1, 2, 3},
		"COL2": []float64{1.5, 2.5, math.NaN()},
		"COL3": []string{"a", "b", "c"},
		"COL4": []*string{&a, nil, &c},
		"COL5": []bool{false, false, true}},
		newqf.ColumnOrder("COL5", "COL4", "COL3", "COL2", "COL1"))
	fmt.Println(f)
}
Output:

COL5(b) COL4(s) COL3(s) COL2(f) COL1(i)
------- ------- ------- ------- -------
  false       a       a     1.5       1
  false    null       b     2.5       2
   true       c       c    null       3

Dims = 5 x 3

func ReadCSV

func ReadCSV(reader io.Reader, confFuncs ...csv.ConfigFunc) QFrame

ReadCSV returns a QFrame with data, in CSV format, taken from reader. Column data types are auto detected if not explicitly specified.

Time complexity O(m * n) where m = number of columns, n = number of rows.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/tobgu/qframe"
)

func main() {
	input := `COL1,COL2
a,1.5
b,2.25
c,3.0`

	f := qframe.ReadCSV(strings.NewReader(input))
	fmt.Println(f)
}
Output:

COL1(s) COL2(f)
------- -------
      a     1.5
      b    2.25
      c       3

Dims = 2 x 3

func ReadJSON

func ReadJSON(reader io.Reader, fns ...newqf.ConfigFunc) QFrame

ReadJSON returns a QFrame with data, in JSON format, taken from reader.

Time complexity O(m * n) where m = number of columns, n = number of rows.

func ReadSQL

func ReadSQL(tx *sql.Tx, confFuncs ...qsql.ConfigFunc) QFrame

ReadSQL returns a QFrame by reading the results of a SQL query.

func (QFrame) Apply

func (qf QFrame) Apply(instructions ...Instruction) QFrame

Apply applies instructions to each row in the QFrame.

Time complexity O(m * n), where m = number of instructions, n = number of rows.

func (QFrame) BoolView

func (qf QFrame) BoolView(colName string) (BoolView, error)

BoolView returns a view into an bool column identified by name.

colName - Name of the column.

Returns an error if the column is missing or of wrong type. Time complexity O(1).

func (QFrame) ByteSize

func (qf QFrame) ByteSize() int

ByteSize returns a best effort estimate of the current size occupied by the QFrame.

This does not factor for cases where multiple, different, frames reference the same underlying data.

Time complexity O(m) where m is the number of columns in the QFrame.

func (QFrame) ColumnNames

func (qf QFrame) ColumnNames() []string

ColumnNames returns the names of all columns in the QFrame.

Time complexity O(n) where n = number of columns.

func (QFrame) ColumnTypeMap

func (qf QFrame) ColumnTypeMap() map[string]types.DataType

ColumnTypeMap returns a map of each underlying column with the column name as a key and it's types.DataType as a value.

Time complexity O(n) where n = number of columns.

func (QFrame) ColumnTypes

func (qf QFrame) ColumnTypes() []types.DataType

ColumnTypes returns all underlying column types.DataType

Time complexity O(n) where n = number of columns.

func (QFrame) Contains

func (qf QFrame) Contains(colName string) bool

Contains reports if a columns with colName is present in the frame.

Time complexity is O(1).

func (QFrame) Copy

func (qf QFrame) Copy(dstCol, srcCol string) QFrame

Copy copies the content of dstCol into srcCol.

dstCol - Name of the column to copy to. srcCol - Name of the column to copy from.

Time complexity O(1). Under the hood no actual copy takes place. The columns will share the underlying data. Since the frame is immutable this is safe.

func (QFrame) Distinct

func (qf QFrame) Distinct(configFns ...groupby.ConfigFunc) QFrame

Distinct returns a new QFrame that only contains unique rows with respect to the specified columns. If no columns are given Distinct will return rows where allow columns are unique.

The order of the returned rows in undefined.

Time complexity O(m * n) where m = number of columns to compare for distinctness, n = number of rows.

func (QFrame) Drop

func (qf QFrame) Drop(columns ...string) QFrame

Drop creates a new projection of te QFrame without the specified columns.

Time complexity O(1).

func (QFrame) EnumView

func (qf QFrame) EnumView(colName string) (EnumView, error)

EnumView returns a view into an enum column identified by name.

colName - Name of the column.

Returns an error if the column is missing or of wrong type. Time complexity O(1).

func (QFrame) Equals

func (qf QFrame) Equals(other QFrame) (equal bool, reason string)

Equals compares this QFrame to another QFrame. If the QFrames are equal (true, "") will be returned else (false, <string describing why>) will be returned.

Time complexity O(m * n) where m = number of columns to group by, n = number of rows.

func (QFrame) Eval

func (qf QFrame) Eval(dstCol string, expr Expression, ff ...eval.ConfigFunc) QFrame

Eval evaluates an expression assigning the result to dstCol.

Eval can be considered an abstraction over Apply. For example it handles management of intermediate/temporary columns that are needed as part of evaluating more complex expressions.

Time complexity O(m*n) where m = number of clauses in the expression, n = number of rows.

func (QFrame) Filter

func (qf QFrame) Filter(clause FilterClause) QFrame

Filter filters the frame according to the filters in clause.

Filters are applied via depth first traversal of the provided filter clause from left to right. Use the following rules of thumb for best performance when constructing filters:

  1. Cheap filters (eg. integer comparisons, ...) should go to the left of more expensive ones (eg. string regex, ...).
  2. High impact filters (eg. filters that you expect will drop a lot of data) should go to the left of low impact filters.

Time complexity O(m * n) where m = number of columns to filter by, n = number of rows.

func (QFrame) FilteredApply

func (qf QFrame) FilteredApply(clause FilterClause, instructions ...Instruction) QFrame

FilteredApply works like Apply but allows adding a filter which limits the rows to which the instructions are applied to. Any rows not matching the filter will be assigned the zero value of the column type.

Time complexity O(m * n), where m = number of instructions, n = number of rows.

func (QFrame) FloatView

func (qf QFrame) FloatView(colName string) (FloatView, error)

FloatView returns a view into an float column identified by name.

colName - Name of the column.

Returns an error if the column is missing or of wrong type. Time complexity O(1).

func (QFrame) GroupBy

func (qf QFrame) GroupBy(configFns ...groupby.ConfigFunc) Grouper

GroupBy groups rows together for which the values of specified columns are the same. Aggregations on the groups can be executed on the returned Grouper object. Leaving out columns to group by will make one large group over which aggregations can be done.

The order of the rows in the Grouper is undefined.

Time complexity O(m * n) where m = number of columns to group by, n = number of rows.

func (QFrame) IntView

func (qf QFrame) IntView(colName string) (IntView, error)

IntView returns a view into an int column identified by name.

colName - Name of the column.

Returns an error if the column is missing or of wrong type. Time complexity O(1).

func (QFrame) Len

func (qf QFrame) Len() int

Len returns the number of rows in the QFrame.

Time complexity O(1).

func (QFrame) MustBoolView

func (qf QFrame) MustBoolView(colName string) BoolView

MustBoolView returns a view into an bool column identified by name.

colName - Name of the column.

Panics if the column is missing or of wrong type. Time complexity 0(1).

func (QFrame) MustEnumView

func (qf QFrame) MustEnumView(colName string) EnumView

MustEnumView returns a view into an enum column identified by name.

colName - Name of the column.

Panics if the column is missing or of wrong type. Time complexity 0(1).

func (QFrame) MustFloatView

func (qf QFrame) MustFloatView(colName string) FloatView

MustFloatView returns a view into an float column identified by name.

colName - Name of the column.

Panics if the column is missing or of wrong type. Time complexity 0(1).

func (QFrame) MustIntView

func (qf QFrame) MustIntView(colName string) IntView

MustIntView returns a view into an int column identified by name.

colName - Name of the column.

Panics if the column is missing or of wrong type. Time complexity 0(1).

func (QFrame) MustStringView

func (qf QFrame) MustStringView(colName string) StringView

MustStringView returns a view into an string column identified by name.

colName - Name of the column.

Panics if the column is missing or of wrong type. Time complexity 0(1).

func (QFrame) Select

func (qf QFrame) Select(columns ...string) QFrame

Select creates a new projection of the QFrame containing only the specified columns.

Time complexity O(1).

func (QFrame) Slice

func (qf QFrame) Slice(start, end int) QFrame

Slice returns a new QFrame consisting of rows [start, end[. Note that the underlying storage is kept. Slicing a frame will not release memory used to store the columns.

Time complexity O(1).

func (QFrame) Sort

func (qf QFrame) Sort(orders ...Order) QFrame

Sort returns a new QFrame sorted according to the orders specified.

Time complexity O(m * n * log(n)) where m = number of columns to sort by, n = number of rows in QFrame.

func (QFrame) String

func (qf QFrame) String() string

String returns a simple string representation of the table. Column type is indicated in parenthesis following the column name. The initial letter in the type name is used for this. Output is currently capped to 50 rows. Use Slice followed by String if you want to print rows that are not among the first 50.

func (QFrame) StringView

func (qf QFrame) StringView(colName string) (StringView, error)

StringView returns a view into an string column identified by name.

colName - Name of the column.

Returns an error if the column is missing or of wrong type. Time complexity O(1).

func (QFrame) ToCSV

func (qf QFrame) ToCSV(writer io.Writer) error

ToCSV writes the data in the QFrame, in CSV format, to writer.

Time complexity O(m * n) where m = number of rows, n = number of columns.

This is function is currently unoptimized. It could probably be a lot speedier with a custom written CSV writer that handles quoting etc. differently.

func (QFrame) ToJSON

func (qf QFrame) ToJSON(writer io.Writer) error

ToJSON writes the data in the QFrame, in JSON format one record per row, to writer.

Time complexity O(m * n) where m = number of rows, n = number of columns.

func (QFrame) ToSQL

func (qf QFrame) ToSQL(tx *sql.Tx, confFuncs ...qsql.ConfigFunc) error

ToSQL writes a QFrame into a SQL database.

type StringView

type StringView struct {
	scolumn.View
}

StringView provides a "view" into an string column and can be used for access to individual elements.

Directories

Path Synopsis
cmd
Package config acts as a base package for different configuration options used when creating or working with QFrames.
Package config acts as a base package for different configuration options used when creating or working with QFrames.
csv
sql
contrib
gonum/qplot
package qplot provides compatibility between QFrame and gonum.org/v1/plot
package qplot provides compatibility between QFrame and gonum.org/v1/plot
Package function contains example functions that can be used in QFrame.Apply and QFrame.Eval.
Package function contains example functions that can be used in QFrame.Apply and QFrame.Eval.
internal
io

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL