seafan

package module

v0.0.30 Latest Latest Go to latest Published: Oct 26, 2022 License: Apache-2.0 Imports: 20 Imported by: 2

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/invertedv/seafan

Links

Open Source Insights

README ¶

Seafan

Package seafan is a set of tools for building DNN models. The build engine is gorgonia.

Seafan features:

A data pipeline based on chutils to access files and ClickHouse tables.
- Point-and-shoot specification of the data
- Simple specification of one-hot features
A wrapper around gorgonia that meshes to the pipeline.
- Simple specification of models, including embeddings
- A fit method with optional early stopping
- Callbacks during model fit
- Saving and loading models
Model diagnostics for categorical targets.
- KS plots
- Decile plots
Utilities.
- Plotting wrapper for plotly for xy plots.
- Numeric struct for (x,y) data and plotting and descriptive statistics.

Documentation ¶

Overview ¶

Package seafan is a set of tools for building DNN models. The build engine is gorgonia (https://pkg.go.dev/gorgonia.org/gorgonia).

Seafan features:

- A data pipeline based on chutils (https://github.com/invertedv/chutils) to access files and ClickHouse tables.

Point-and-shoot specification of the data
Simple specification of one-hot features

- A wrapper around gorgonia that meshes to the pipeline.

Simple specification of models, including embeddings
A fit method with optional early stopping and callbacks
Saving and loading models

- Model diagnostics for categorical targets.

KS plots
Decile plots

- Utilities.

Plotting wrapper for plotly (https://github.com/MetalBlueberry/go-plotly) for xy plots.
Numeric struct for (x,y) data and plotting and descriptive statistics.

Index ¶

Variables
func AddFitted(pipeIn Pipeline, nnFile string, target []int, name string, fts FTypes, ...) error
func AnyLess(x, y any) (bool, error)
func Coalesce(vals []float64, nCat int, trg []int, binary, logodds bool, sl Slicer) ([]float64, error)
func CrossEntropy(model *NNModel) (cost *G.Node)
func Decile(xy *XY, plt *PlotDef) error
func GetNode(ns G.Nodes, name string) *G.Node
func KS(xy *XY, plt *PlotDef) (ks float64, notTarget *Desc, target *Desc, err error)
func LeakyReluAct(n *G.Node, alpha float64) *G.Node
func LinearAct(n *G.Node) *G.Node
func Marginal(nnFile string, feat string, target []int, pipe Pipeline, pd *PlotDef, ...) error
func Max(a, b int) int
func Min(a, b int) int
func Plotter(fig *grob.Fig, lay *grob.Layout, pd *PlotDef) error
func R2(y, yhat []float64) float64
func RMS(model *NNModel) (cost *G.Node)
func ReluAct(n *G.Node) *G.Node
func SegPlot(pipe Pipeline, obs, fit, seg string, plt *PlotDef, minVal, maxVal *float64) error
func SigmoidAct(n *G.Node) *G.Node
func SoftMaxAct(n *G.Node) *G.Node
func SoftRMS(model *NNModel) (cost *G.Node)
func Strip(s string) (left, inner string, err error)
func UnNormalize(vals []float64, ft *FType) (unNorm []float64)
func Unique(xs []any) []any
func Wrapper(e error, text string) error
type Activation
- func StrAct(s string) (*Activation, float64)
- func (i Activation) String() string
type Args
- func MakeArgs(s string) (keyval Args, err error)
- func (kv Args) Get(key string, kind reflect.Kind) (val any)
type ChData
- func NewChData(name string, opts ...Opts) *ChData
- func (ch *ChData) Batch(inputs G.Nodes) bool
- func (ch *ChData) BatchSize() int
- func (ch *ChData) Cols(field string) int
- func (ch *ChData) Describe(field string, topK int) string
- func (ch *ChData) Epoch(setTo int) int
- func (ch *ChData) FieldList() []string
- func (ch *ChData) GData() *GData
- func (ch *ChData) Get(field string) *GDatum
- func (ch *ChData) GetFType(field string) *FType
- func (ch *ChData) GetFTypes() FTypes
- func (ch *ChData) Init() (err error)
- func (ch *ChData) IsCat(field string) bool
- func (ch *ChData) IsCts(field string) bool
- func (ch *ChData) IsNormalized(field string) bool
- func (ch *ChData) IsSorted() bool
- func (ch *ChData) Name() string
- func (ch *ChData) Rows() int
- func (ch *ChData) SaveFTypes(fileName string) error
- func (ch *ChData) Shuffle()
- func (ch *ChData) Slice(sl Slicer) (Pipeline, error)
- func (ch *ChData) Sort(field string, ascending bool) error
- func (ch *ChData) SortField() string
- func (ch *ChData) String() string
type CostFunc
type DOLayer
- func DropOutParse(s string) (*DOLayer, error)
type Desc
- func Assess(xy *XY, cutoff float64) (n int, precision, recall, accuracy float64, obs, fit *Desc, err error)
- func NewDesc(u []float64, name string) (*Desc, error)
- func (d *Desc) Populate(x []float64, noSort bool, sl Slicer)
- func (d *Desc) String() string
type FCLayer
- func FCParse(s string) (fc *FCLayer, err error)
type FParam
type FRole
- func (i FRole) String() string
type FType
- func (ft *FType) String() string
type FTypes
- func LoadFTypes(fileName string) (fts FTypes, err error)
- func (fts FTypes) DropFields(dropFields ...string) FTypes
- func (fts FTypes) Get(name string) *FType
- func (fts FTypes) Save(fileName string) (err error)
type Fit
- func NewFit(nn *NNModel, epochs int, p Pipeline, opts ...FitOpts) *Fit
- func (ft *Fit) BestEpoch() int
- func (ft *Fit) Do() (err error)
- func (ft *Fit) InCosts() *XY
- func (ft *Fit) NNModel() *NNModel
- func (ft *Fit) OutCosts() *XY
- func (ft *Fit) OutFile() string
type FitOpts
- func WithL2Reg(penalty float64) FitOpts
- func WithLearnRate(lrStart, lrEnd float64) FitOpts
- func WithOutFile(fileName string) FitOpts
- func WithShuffle(interval int) FitOpts
- func WithValidation(p Pipeline, wait int) FitOpts
type GData
- func NewGData() *GData
- func (gd *GData) AppendC(raw *Raw, name string, normalize bool, fp *FParam) error
- func (gd *GData) AppendD(raw *Raw, name string, fp *FParam) error
- func (gd *GData) AppendField(newData *Raw, name string, fRole FRole) error
- func (gd *GData) Close() error
- func (gd *GData) CountLines() (numLines int, err error)
- func (gd *GData) Drop(field string)
- func (gd *GData) FieldCount() int
- func (gd *GData) FieldList() []string
- func (gd *GData) Get(name string) *GDatum
- func (gd *GData) GetRaw(field string) (*Raw, error)
- func (gd *GData) IsSorted() bool
- func (gd *GData) Len() int
- func (gd *GData) Less(i, j int) bool
- func (gd *GData) MakeOneHot(from, name string) error
- func (gd *GData) Read(nTarget int, validate bool) (data []chutils.Row, valid []chutils.Valid, err error)
- func (gd *GData) Reset() error
- func (gd *GData) Rows() int
- func (gd *GData) Seek(lineNo int) error
- func (gd *GData) Shuffle()
- func (gd *GData) Slice(sl Slicer) (*GData, error)
- func (gd *GData) Sort(field string, ascending bool) error
- func (gd *GData) SortField() string
- func (gd *GData) Swap(i, j int)
- func (gd *GData) TableSpec() *chutils.TableDef
- func (gd *GData) UpdateFts(newFts FTypes) (*GData, error)
type GDatum
- func (g *GDatum) Describe(topK int) string
- func (g *GDatum) String() string
type Layer
- func (i Layer) String() string
type Levels
- func ByCounts(data *Raw, sl Slicer) Levels
- func ByPtr(data *Raw) Levels
- func (l Levels) FindValue(val int32) any
- func (l Levels) Sort(byName, ascend bool) (key []any, val []int32)
- func (l Levels) TopK(topNum int, byName, ascend bool) string
type ModSpec
- func LoadModSpec(fileName string) (ms ModSpec, err error)
- func (m ModSpec) Check() error
- func (m ModSpec) DropOut(loc int) *DOLayer
- func (m ModSpec) FC(loc int) *FCLayer
- func (m ModSpec) Inputs(p Pipeline) (FTypes, error)
- func (m ModSpec) LType(i int) (*Layer, error)
- func (m ModSpec) Save(fileName string) (err error)
- func (m ModSpec) Target(p Pipeline) (*FType, error)
- func (m ModSpec) TargetName() string
type NNModel
- func LoadNN(fileRoot string, p Pipeline, build bool) (nn *NNModel, err error)
- func NewNNModel(modSpec ModSpec, pipe Pipeline, build bool, nnOpts ...NNOpts) (*NNModel, error)
- func PredictNN(fileRoot string, pipe Pipeline, build bool, opts ...NNOpts) (nn *NNModel, err error)
- func PredictNNwFts(fileRoot string, pipe Pipeline, build bool, fts FTypes, opts ...NNOpts) (nn *NNModel, err error)
- func (m *NNModel) Cols() int
- func (m *NNModel) Cost() *G.Node
- func (m *NNModel) CostFlt() float64
- func (m *NNModel) CostFn() CostFunc
- func (m *NNModel) Features() G.Nodes
- func (m *NNModel) FitSlice() []float64
- func (m *NNModel) Fitted() G.Result
- func (m *NNModel) Fwd()
- func (m *NNModel) G() *G.ExprGraph
- func (m *NNModel) InputFT() FTypes
- func (m *NNModel) Inputs() G.Nodes
- func (m *NNModel) ModSpec() ModSpec
- func (m *NNModel) Name() string
- func (m *NNModel) Obs() *G.Node
- func (m *NNModel) ObsSlice() []float64
- func (m *NNModel) Opts() []NNOpts
- func (m *NNModel) OutputCols() int
- func (m *NNModel) Params() G.Nodes
- func (m *NNModel) Save(fileRoot string) (err error)
- func (m *NNModel) String() string
type NNOpts
- func WithCostFn(cf CostFunc) NNOpts
- func WithName(name string) NNOpts
type Opts
- func WithBatchSize(bsize int) Opts
- func WithCallBack(cb Opts) Opts
- func WithCats(names ...string) Opts
- func WithCycle(cycle bool) Opts
- func WithFtypes(fts FTypes) Opts
- func WithNormalized(names ...string) Opts
- func WithOneHot(name, from string) Opts
- func WithReader(rdr any) Opts
type Pipeline
type PlotDef
type Raw
- func AllocRaw(n int, kind reflect.Kind) *Raw
- func NewRaw(x []any, sl Slicer) *Raw
- func NewRawCast(x any, sl Slicer) *Raw
- func (r *Raw) Len() int
- func (r *Raw) Less(i, j int) bool
- func (r *Raw) Swap(i, j int)
type SeaError
- func (seaErr SeaError) Error() string
type Slice
- func NewSlice(feat string, minCnt int, pipe Pipeline, restrict []any) (*Slice, error)
- func (s *Slice) Index() int32
- func (s *Slice) Iter() bool
- func (s *Slice) MakeSlicer() Slicer
- func (s *Slice) Title() string
- func (s *Slice) Value() any
type Slicer
- func SlicerAnd(s1, s2 Slicer) Slicer
- func SlicerOr(s1, s2 Slicer) Slicer
type Summary
type VecData
- func NewVecData(name string, data *GData, opts ...Opts) *VecData
- func (vec *VecData) Batch(inputs G.Nodes) bool
- func (vec *VecData) BatchSize() int
- func (vec *VecData) Cols(field string) int
- func (vec *VecData) Describe(field string, topK int) string
- func (vec *VecData) Epoch(setTo int) int
- func (vec *VecData) FieldList() []string
- func (vec *VecData) GData() *GData
- func (vec *VecData) Get(field string) *GDatum
- func (vec *VecData) GetFType(field string) *FType
- func (vec *VecData) GetFTypes() FTypes
- func (vec *VecData) Init() error
- func (vec *VecData) IsCat(field string) bool
- func (vec *VecData) IsCts(field string) bool
- func (vec *VecData) IsNormalized(field string) bool
- func (vec *VecData) IsSorted() bool
- func (vec *VecData) Name() string
- func (vec *VecData) Rows() int
- func (vec *VecData) SaveFTypes(fileName string) error
- func (vec *VecData) Shuffle()
- func (vec *VecData) Slice(sl Slicer) (Pipeline, error)
- func (vec *VecData) Sort(field string, ascending bool) error
- func (vec *VecData) SortField() string
- func (vec *VecData) String() string
type XY
- func NewXY(x, y []float64) (*XY, error)
- func (p *XY) Interp(xNew []float64) (*XY, error)
- func (p *XY) Len() int
- func (p *XY) Less(i, j int) bool
- func (p *XY) Plot(pd *PlotDef, scatter bool) error
- func (p *XY) Sort() error
- func (p *XY) String() string
- func (p *XY) Swap(i, j int)

Constants ¶

This section is empty.

Variables ¶

View Source

var Browser = "firefox"

Browser is the browser to use for plotting.

View Source

var Verbose = true

Verbose controls amount of printing.

Functions ¶

func AddFitted ¶

func AddFitted(pipeIn Pipeline, nnFile string, target []int, name string, fts FTypes, logodds bool, obsFit *FType) error

AddFitted addes fitted values to a Pipeline. The features can be re-normalized/re-mapped to align pipeIn with the model build pipeIn -- input Pipeline to run the model on nnFile -- root directory of NNModel target -- target columns of the model output to coalesce name -- name of fitted value in Pipeline fts -- options FTypes to use for normalizing pipeIn

func AnyLess ¶

func AnyLess(x, y any) (bool, error)

AnyLess returns x<y for select underlying types of "any"

func Coalesce ¶

func Coalesce(vals []float64, nCat int, trg []int, binary, logodds bool, sl Slicer) ([]float64, error)

Coalesce combines columns of a either a one-hot feature or a softmax output. In the case of a feature, it returns 1 if any of the target columns is 1. In the case of a softmax output, it sums the entries.

func CrossEntropy ¶

func CrossEntropy(model *NNModel) (cost *G.Node)

CrossEntropy cost function

func Decile ¶

func Decile(xy *XY, plt *PlotDef) error

Decile generates a decile plot based on xy

XY        values to base the plot on.
plt       PlotDef plot options.  If plt is nil an error is generated.

The deciles are created based on the values of xy.X

func GetNode ¶

func GetNode(ns G.Nodes, name string) *G.Node

GetNode returns a node by name from a G.Nodes

func KS ¶

func KS(xy *XY, plt *PlotDef) (ks float64, notTarget *Desc, target *Desc, err error)

KS finds the KS of a softmax model that is reduced to a binary outcome.

xy        XY struct where x is fitted value and y is the binary observed value
plt       PlotDef plot options.  If plt is nil, no plot is produced.

The ks statistic is returned as are Desc descriptions of the model for the two groups. Returns

ks          KS statistic
notTarget  Desc struct of fitted values of the non-target outcomes
target     Desc struct of fitted values of target outcomes

Target: html plot file and/or plot in browser.

func LeakyReluAct ¶

func LeakyReluAct(n *G.Node, alpha float64) *G.Node

LeakyReluAct is leaky relu activation

func LinearAct ¶

func LinearAct(n *G.Node) *G.Node

LinearAct is a no-op. It is the default ModSpec default activation.

func Marginal ¶

func Marginal(nnFile string, feat string, target []int, pipe Pipeline, pd *PlotDef, obsFtype *FType) error

Marginal produces a set of plots to aid in understanding the effect of a feature. The plot takes the model output and creates six segments based on the quantiles of the model output: (<.1, .1-.25, .25-.5, .5-.75, .75-.9, .9-1).

For each segment, the feature being analyzed various across its range within the quartile (continuous) its values (discrete). The bottom row shows the distribution of the feature within the quartile range.

func Max ¶

func Max(a, b int) int

Max returns the Max of a & b

func Min ¶

func Min(a, b int) int

Min returns the Min of a & b

func Plotter ¶

func Plotter(fig *grob.Fig, lay *grob.Layout, pd *PlotDef) error

Plotter plots the Plotly Figure fig with Layout lay. The layout is augmented by features I commonly use.

fig      plotly figure
lay      plotly layout (nil is OK)
pd       PlotDef structure with plot options.

lay can be initialized with any additional layout options needed.

func R2 ¶ added in v0.0.30

func R2(y, yhat []float64) float64

R2 returns the model r-square. Returns -1 if an error.

func RMS ¶

func RMS(model *NNModel) (cost *G.Node)

RMS cost function

func ReluAct ¶

func ReluAct(n *G.Node) *G.Node

ReluAct is relu activation

func SegPlot ¶ added in v0.0.29

func SegPlot(pipe Pipeline, obs, fit, seg string, plt *PlotDef, minVal, maxVal *float64) error

SegPlot generates a decile plot of the fields y and fit in pipe. The segments are based on the values of the field seg. If seg is continuous, the segments are based on quantiles: 0-.1, .1-.25, .25-.5, .5-.75, .9-1

	obs       observed field (y-axis) name
	fit       fitted field (x-axis) name
    seg       segmenting field name
	plt       PlotDef plot options.  If plt is nil an error is generated.

func SigmoidAct ¶

func SigmoidAct(n *G.Node) *G.Node

SigmoidAct is sigmoid activation

func SoftMaxAct ¶

func SoftMaxAct(n *G.Node) *G.Node

SoftMaxAct implements softmax activation functin

func SoftRMS ¶

func SoftRMS(model *NNModel) (cost *G.Node)

func Strip ¶

func Strip(s string) (left, inner string, err error)

Strip is a utility that takes a string of the form "Func(args)" and returns "Func" and "args"

func UnNormalize ¶ added in v0.0.29

func UnNormalize(vals []float64, ft *FType) (unNorm []float64)

UnNormalize un-normalizes a slice, if need be

func Unique ¶

func Unique(xs []any) []any

Unique returns a slice of the unique values of xs

func Wrapper ¶

func Wrapper(e error, text string) error

Types ¶

type Activation ¶

type Activation int

Activation types

const (
	Linear Activation = 0 + iota
	Relu
	LeakyRelu
	Sigmoid
	SoftMax
)

func StrAct ¶

func StrAct(s string) (*Activation, float64)

StrAct takes a string and returns corresponding Activation and any parameter. Nil if fails.

func (Activation) String ¶

func (i Activation) String() string

type Args ¶

type Args map[string]string

Args map holds layer arguments in key/val style

func MakeArgs ¶

func MakeArgs(s string) (keyval Args, err error)

MakeArgs takes an argument string of the form "arg1:val1, arg2:val2, ...." and returns entries in key/val format

func (Args) Get ¶

func (kv Args) Get(key string, kind reflect.Kind) (val any)

Get returns a val from Args coercing to type kind. Nil if fails.

type ChData ¶

type ChData struct {
	// contains filtered or unexported fields
}

ChData provides a Pipeline interface into text files (delimited, fixed length) and ClickHouse.

func NewChData ¶

func NewChData(name string, opts ...Opts) *ChData

func (*ChData) Batch ¶

func (ch *ChData) Batch(inputs G.Nodes) bool

Batch loads a batch into inputs. It returns false if the epoch is done. If cycle is true, it will start at the beginning on the next call. If cycle is false, it will call Init() at the next call to Batch()

Example ¶

dataPath := os.Getenv("data") // path to data directory
fileName := dataPath + "/test1.csv"
f, e := os.Open(fileName)

if e != nil {
	panic(e)
}
// set up chutils file reader
rdr := file.NewReader(fileName, ',', '\n', 0, 0, 1, 0, f, 0)
e = rdr.Init("", chutils.MergeTree)

if e != nil {
	panic(e)
}

// determine data types
e = rdr.TableSpec().Impute(rdr, 0, .99)

if e != nil {
	panic(e)
}

bSize := 100
ch := NewChData("Test ch Pipeline",
	WithBatchSize(bSize),
	WithReader(rdr),
	WithNormalized("x1"))
// create a graph & node to illustrate Batch()
g := G.NewGraph()
node := G.NewTensor(g, G.Float64, 2, G.WithName("x1"), G.WithShape(bSize, 1), G.WithInit(G.Zeroes()))

var sumX = 0.0
n := 0
// run through batchs and verify counts and mean of x1 is zero
for ch.Batch(G.Nodes{node}) {
	n += bSize
	x := node.Value().Data().([]float64)
	for _, xv := range x {
		sumX += xv
	}
}

mean := sumX / float64(n)

fmt.Printf("mean of x1: %0.2f", math.Abs(mean))

Output:

rows read:  8500
mean of x1: 0.00

Example (Example2) ¶

// We can normalize fields by values we supply rather than the values in the epoch.
dataPath := os.Getenv("data") // path to data directory
fileName := dataPath + "/test1.csv"
f, e := os.Open(fileName)

if e != nil {
	panic(e)
}

// set up chutils file reader
rdr := file.NewReader(fileName, ',', '\n', 0, 0, 1, 0, f, 0)
e = rdr.Init("", chutils.MergeTree)

if e != nil {
	panic(e)
}

// determine data types
e = rdr.TableSpec().Impute(rdr, 0, .99)

if e != nil {
	panic(e)
}

bSize := 100
// Let's normalize x1 with location=41 and scale=1
ft := &FType{
	Name:       "x1",
	Role:       0,
	Cats:       0,
	EmbCols:    0,
	Normalized: true,
	From:       "",
	FP:         &FParam{Location: 40, Scale: 1},
}
ch := NewChData("Test ch Pipeline",
	WithBatchSize(bSize),
	WithReader(rdr))

WithFtypes(FTypes{ft})(ch)

// create a graph & node to illustrate Batch()
g := G.NewGraph()
node := G.NewTensor(g, G.Float64, 2, G.WithName("x1"), G.WithShape(bSize, 1), G.WithInit(G.Zeroes()))

sumX := 0.0
n := 0
// run through batchs and verify counts and mean of x1 is zero
for ch.Batch(G.Nodes{node}) {
	n += bSize
	x := node.Value().Data().([]float64)
	for _, xv := range x {
		sumX += xv
	}
}

mean := sumX / float64(n)

fmt.Printf("mean of x1: %0.2f", math.Abs(mean))

Output:

rows read:  8500
mean of x1: 39.50

func (*ChData) BatchSize ¶

func (ch *ChData) BatchSize() int

BatchSize returns Pipeline batch size. Use WithBatchSize to set this.

func (*ChData) Cols ¶

func (ch *ChData) Cols(field string) int

Cols returns the # of columns in the field

func (*ChData) Describe ¶

func (ch *ChData) Describe(field string, topK int) string

Describe describes a field. If the field has role FRCat, the top k values (by frequency) are returned.

func (*ChData) Epoch ¶

func (ch *ChData) Epoch(setTo int) int

Epoch sets the epoch to setTo if setTo >=0. Returns epoch #.

func (*ChData) FieldList ¶

func (ch *ChData) FieldList() []string

FieldList returns a slice of field names in the Pipeline

func (*ChData) GData ¶

func (ch *ChData) GData() *GData

GData returns the Pipelines' GData

func (*ChData) Get ¶

func (ch *ChData) Get(field string) *GDatum

Get returns a fields's GDatum

func (*ChData) GetFType ¶

func (ch *ChData) GetFType(field string) *FType

GetFType returns the field's FType

func (*ChData) GetFTypes ¶

func (ch *ChData) GetFTypes() FTypes

GetFTypes returns FTypes for ch Pipeline.

func (*ChData) Init ¶

func (ch *ChData) Init() (err error)

Init initializes the Pipeline.

Example ¶

dataPath := os.Getenv("data") // path to data directory
fileName := dataPath + "/test1.csv"
f, e := os.Open(fileName)

if e != nil {
	panic(e)
}

// set up chutils file reader
rdr := file.NewReader(fileName, ',', '\n', 0, 0, 1, 0, f, 0)
e = rdr.Init("", chutils.MergeTree)
if e != nil {
	panic(e)
}

// determine data types
e = rdr.TableSpec().Impute(rdr, 0, .99)

if e != nil {
	panic(e)
}

bSize := 100
ch := NewChData("Test ch Pipeline", WithBatchSize(bSize),
	WithReader(rdr), WithCycle(true),
	WithCats("y", "y1", "y2", "x4"),
	WithOneHot("yoh", "y"),
	WithOneHot("y1oh", "y1"),
	WithOneHot("x4oh", "x4"),
	WithNormalized("x1", "x2", "x3"),
	WithOneHot("y2oh", "y2"))
// initialize pipeline
e = ch.Init()

if e != nil {
	panic(e)
}

Output:

rows read:  8500

func (*ChData) IsCat ¶

func (ch *ChData) IsCat(field string) bool

IsCat returns true if field has role FRCat.

func (*ChData) IsCts ¶

func (ch *ChData) IsCts(field string) bool

IsCts returns true if the field has role FRCts.

func (*ChData) IsNormalized ¶

func (ch *ChData) IsNormalized(field string) bool

IsNormalized returns true if the field is normalized.

func (*ChData) IsSorted ¶

func (ch *ChData) IsSorted() bool

IsSorted returns true if the data has been sorted.

func (*ChData) Name ¶

func (ch *ChData) Name() string

Name returns Pipeline name

func (*ChData) Rows ¶

func (ch *ChData) Rows() int

Rows is # of rows of data in the Pipeline

func (*ChData) SaveFTypes ¶

func (ch *ChData) SaveFTypes(fileName string) error

SaveFTypes saves the FTypes for the Pipeline.

Example ¶

// Field Types (FTypes) can be saved once they're created.  This preserves key information like
//  - The field role
//  - Location and Scale used in normalization
//  - Mapping of discrete fields
//  - Construction of one-hot fields
dataPath := os.Getenv("data") // path to data directory
fileName := dataPath + "/test1.csv"
f, e := os.Open(fileName)

if e != nil {
	panic(e)
}

// set up chutils file reader
rdr := file.NewReader(fileName, ',', '\n', 0, 0, 1, 0, f, 0)
e = rdr.Init("", chutils.MergeTree)

if e != nil {
	panic(e)
}

// determine data types
e = rdr.TableSpec().Impute(rdr, 0, .99)

if e != nil {
	panic(e)
}

bSize := 100
ch := NewChData("Test ch Pipeline", WithBatchSize(bSize),
	WithReader(rdr), WithCycle(true),
	WithCats("y", "y1", "y2", "x4"),
	WithOneHot("yoh", "y"),
	WithOneHot("y1oh", "y1"),
	WithOneHot("x4oh", "x4"),
	WithNormalized("x1", "x2", "x3"),
	WithOneHot("y2oh", "y2"))
// initialize pipeline
e = ch.Init()

if e != nil {
	panic(e)
}

outFile := os.TempDir() + "/seafan.json"

if e = ch.SaveFTypes(outFile); e != nil {
	panic(e)
}

saveFTypes, e := LoadFTypes(outFile)

if e != nil {
	panic(e)
}

ch1 := NewChData("Saved FTypes", WithReader(rdr), WithBatchSize(bSize),
	WithFtypes(saveFTypes))

if e := ch1.Init(); e != nil {
	panic(e)
}

fmt.Printf("Role of field y1oh: %s", ch.GetFType("y1oh").Role)

Output:

rows read:  8500
rows read:  8500
Role of field y1oh: FROneHot

func (*ChData) Shuffle ¶

func (ch *ChData) Shuffle()

Shuffle shuffles the data

func (*ChData) Slice ¶

func (ch *ChData) Slice(sl Slicer) (Pipeline, error)

Slice returns a VecData Pipeline sliced according to sl

func (*ChData) Sort ¶

func (ch *ChData) Sort(field string, ascending bool) error

Sort sorts the data

func (*ChData) SortField ¶

func (ch *ChData) SortField() string

SortField returns the field the data is sorted on.

func (*ChData) String ¶

func (ch *ChData) String() string

type CostFunc ¶

type CostFunc func(model *NNModel) *G.Node

CostFunc function prototype for cost functions

type DOLayer ¶

type DOLayer struct {
	//	position int     // insert dropout after layer AfterLayer
	DropProb float64 // dropout probability
}

DOLayer specifies a dropout layer. It occurs in the graph after dense layer AfterLayer (the input layer is layer 0).

func DropOutParse ¶

func DropOutParse(s string) (*DOLayer, error)

DropOutParse parses the arguments to a drop out layer

type Desc ¶

type Desc struct {
	Name string    // Name is the name of feature we are describing
	N    int       // N is the number of observations
	U    []float64 // U is the slice of locations at which to find the quantile
	Q    []float64 // Q is the slice of empirical quantiles
	Mean float64   // Mean is the average of the data
	Std  float64   // standard deviation
}

Desc contains descriptive information of a float64 slice

func Assess ¶

func Assess(xy *XY, cutoff float64) (n int, precision, recall, accuracy float64, obs, fit *Desc, err error)

Assess returns a selection of statistics of the fit

func NewDesc ¶

func NewDesc(u []float64, name string) (*Desc, error)

NewDesc creates a pointer to a new Desc struct instance with error checking.

u is a slice of values at which to find quantiles. If nil, a standard set is used.
name is the name of the feature (for printing)(

func (*Desc) Populate ¶

func (d *Desc) Populate(x []float64, noSort bool, sl Slicer)

Populate calculates the descriptive statistics based on x. The slice is not sorted if noSort

func (*Desc) String ¶

func (d *Desc) String() string

type FCLayer ¶

type FCLayer struct {
	Size    int
	Bias    bool
	Act     Activation
	ActParm float64
}

FCLayer has details of a fully connected layer

func FCParse ¶

func FCParse(s string) (fc *FCLayer, err error)

FCParse parses the arguments to an FC layer

type FParam ¶

type FParam struct {
	Location float64 `json:"location"` // location parameter for *Cts
	Scale    float64 `json:"scale"`    // scale parameter for *Cts
	Default  any     `json:"default"`  // default level for *Dscrt
	Lvl      Levels  `json:"lvl"`      // map of values to int32 category for *Dscrt
}

FParam -- field parameters -- is summary data about a field. These values may not be derived from the current data but are applied to the current data.

type FRole ¶

type FRole int

FRole is the role a feature plays

const (
	FRCts FRole = 0 + iota
	FRCat
	FROneHot
	FREmbed
)

func (FRole) String ¶

func (i FRole) String() string

type FType ¶

type FType struct {
	Name       string
	Role       FRole
	Cats       int
	EmbCols    int
	Normalized bool
	From       string
	FP         *FParam
}

FType represents a single field. It holds key information about the feature: its role, dimensions, summary info.

func (*FType) String ¶

func (ft *FType) String() string

type FTypes ¶

type FTypes []*FType

func LoadFTypes ¶

func LoadFTypes(fileName string) (fts FTypes, err error)

LoadFTypes loads a file created by the FTypes Save method

func (FTypes) DropFields ¶

func (fts FTypes) DropFields(dropFields ...string) FTypes

DropFields will drop fields from the FTypes

func (FTypes) Get ¶

func (fts FTypes) Get(name string) *FType

Get returns the *FType of name

func (FTypes) Save ¶

func (fts FTypes) Save(fileName string) (err error)

Save saves FTypes to a json file--fileName

type Fit ¶

type Fit struct {
	// contains filtered or unexported fields
}

Fit struct for fitting a NNModel

func NewFit ¶

func NewFit(nn *NNModel, epochs int, p Pipeline, opts ...FitOpts) *Fit

NewFit creates a new *Fit.

func (*Fit) BestEpoch ¶

func (ft *Fit) BestEpoch() int

BestEpoch returns the epoch of the best cost (validation or in-sample--whichever is specified)

func (*Fit) Do ¶

func (ft *Fit) Do() (err error)

Do is the fitting loop.

Example ¶

Verbose = false
bSize := 100
// generate a Pipeline of type *ChData that reads test.csv in the data directory
pipe := chPipe(bSize, "test1.csv")
// generate model: target and features.  Target yoh is one-hot with 2 levels
mod := ModSpec{
	"Input(x1+x2+x3+x4)",
	"FC(size:3, activation:relu)",
	"DropOut(.1)",
	"FC(size:2, activation:softmax)",
	"Target(yoh)",
}
// model is straight-forward with no hidden layers or dropouts.
nn, e := NewNNModel(mod, pipe, true, WithCostFn(CrossEntropy))

if e != nil {
	panic(e)
}

epochs := 150
ft := NewFit(nn, epochs, pipe)
e = ft.Do()

if e != nil {
	panic(e)
}
// Plot the in-sample cost in a browser (default: firefox)
e = ft.InCosts().Plot(&PlotDef{Title: "In-Sample Cost Curve", Height: 1200, Width: 1200,
	Show: true, XTitle: "epoch", YTitle: "Cost"}, true)

if e != nil {
	panic(e)
}

Output:

Example (Example2) ¶

// This example demonstrates how to use a validation sample for early stopping
Verbose = false
bSize := 100
// generate a Pipeline of type *ChData that reads test.csv in the data directory
mPipe := chPipe(bSize, "test1.csv")
vPipe := chPipe(1000, "testVal.csv")

// generate model: target and features.  Target yoh is one-hot with 2 levels
mod := ModSpec{
	"Input(x1+x2+x3+x4)",
	"FC(size:3, activation:relu)",
	"DropOut(.1)",
	"FC(size:2, activation:softmax)",
	"Target(yoh)",
}
nn, e := NewNNModel(mod, mPipe, true, WithCostFn(CrossEntropy))

if e != nil {
	panic(e)
}

epochs := 150
ft := NewFit(nn, epochs, mPipe)
WithValidation(vPipe, 10)(ft)
e = ft.Do()

if e != nil {
	panic(e)
}
// Plot the in-sample cost in a browser (default: firefox)
e = ft.InCosts().Plot(&PlotDef{Title: "In-Sample Cost Curve", Height: 1200, Width: 1200,
	Show: true, XTitle: "epoch", YTitle: "Cost"}, true)

if e != nil {
	panic(e)
}

e = ft.OutCosts().Plot(&PlotDef{Title: "Validation Sample Cost Curve", Height: 1200, Width: 1200,
	Show: true, XTitle: "epoch", YTitle: "Cost"}, true)

if e != nil {
	panic(e)
}

Output:

func (*Fit) InCosts ¶

func (ft *Fit) InCosts() *XY

InCosts returns XY: X=epoch, Y=In-sample cost

func (*Fit) NNModel ¶ added in v0.0.24

func (ft *Fit) NNModel() *NNModel

NNModel returns model

func (*Fit) OutCosts ¶

func (ft *Fit) OutCosts() *XY

OutCosts returns XY: X=epoch, Y=validation cost

func (*Fit) OutFile ¶

func (ft *Fit) OutFile() string

OutFile returns the output file name

type FitOpts ¶

type FitOpts func(*Fit)

FitOpts functions add options

func WithL2Reg ¶

func WithL2Reg(penalty float64) FitOpts

WithL2Reg adds L2 regularization

func WithLearnRate ¶

func WithLearnRate(lrStart, lrEnd float64) FitOpts

WithLearnRate sets a learning rate function that declines linearly across the epochs.

func WithOutFile ¶

func WithOutFile(fileName string) FitOpts

WithOutFile specifies the file root name to save the best model.

func WithShuffle ¶

func WithShuffle(interval int) FitOpts

WithShuffle shuffles after interval epochs Default is 0 (don't shuffle ever)

func WithValidation ¶

func WithValidation(p Pipeline, wait int) FitOpts

WithValidation adds a validation Pipeline for early stopping. The fit is stopped when the validation cost does not improve for wait epochs.

type GData ¶

type GData struct {
	// contains filtered or unexported fields
}

func NewGData ¶

func NewGData() *GData

NewGData returns a new instance of GData

func (*GData) AppendC ¶

func (gd *GData) AppendC(raw *Raw, name string, normalize bool, fp *FParam) error

AppendC appends a continuous feature

func (*GData) AppendD ¶

func (gd *GData) AppendD(raw *Raw, name string, fp *FParam) error

AppendD appends a discrete feature

func (*GData) AppendField ¶ added in v0.0.29

func (gd *GData) AppendField(newData *Raw, name string, fRole FRole) error

AppendField adds a field to gd

func (*GData) Close ¶ added in v0.0.27

func (gd *GData) Close() error

func (*GData) CountLines ¶ added in v0.0.27

func (gd *GData) CountLines() (numLines int, err error)

func (*GData) Drop ¶ added in v0.0.15

func (gd *GData) Drop(field string)

Drop drops a field from *GData

func (*GData) FieldCount ¶

func (gd *GData) FieldCount() int

FieldCount returns the number of fields in GData

func (*GData) FieldList ¶

func (gd *GData) FieldList() []string

FieldList returns the names of the fields in GData

func (*GData) Get ¶

func (gd *GData) Get(name string) *GDatum

Get returns a single feature from GData

func (*GData) GetRaw ¶

func (gd *GData) GetRaw(field string) (*Raw, error)

GetRaw returns the raw data for the field.

func (*GData) IsSorted ¶

func (gd *GData) IsSorted() bool

IsSorted returns true if GData has been sorted by SortField

func (*GData) Len ¶

func (gd *GData) Len() int

func (*GData) Less ¶

func (gd *GData) Less(i, j int) bool

func (*GData) MakeOneHot ¶

func (gd *GData) MakeOneHot(from, name string) error

MakeOneHot creates & appends a one hot feature from a discrete feature

func (*GData) Read ¶ added in v0.0.27

func (gd *GData) Read(nTarget int, validate bool) (data []chutils.Row, valid []chutils.Valid, err error)

Read reads row(s) in the format of chutils. Note: valids are all chutils.Valid. Invoking Read for the first time causes it to recreate the raw data of existing fields -- so the memory requirement will go up.

func (*GData) Reset ¶ added in v0.0.27

func (gd *GData) Reset() error

func (*GData) Rows ¶

func (gd *GData) Rows() int

Rows returns # of obserations in each element of GData

func (*GData) Seek ¶ added in v0.0.27

func (gd *GData) Seek(lineNo int) error

func (*GData) Shuffle ¶

func (gd *GData) Shuffle()

Shuffle shuffles the GData fields as a unit

func (*GData) Slice ¶

func (gd *GData) Slice(sl Slicer) (*GData, error)

Slice creates a new GData sliced according to sl

func (*GData) Sort ¶

func (gd *GData) Sort(field string, ascending bool) error

Sort sorts the GData on field. Calling Sort.Sort directly will cause a panic. Sorting a OneHot or Embedded field sorts on the underlying Categorical field

func (*GData) SortField ¶

func (gd *GData) SortField() string

SortField returns the field the GData is sorted on

func (*GData) Swap ¶

func (gd *GData) Swap(i, j int)

func (*GData) TableSpec ¶ added in v0.0.27

func (gd *GData) TableSpec() *chutils.TableDef

func (*GData) UpdateFts ¶ added in v0.0.10

func (gd *GData) UpdateFts(newFts FTypes) (*GData, error)

UpdateFts produces a new *GData using the given FTypes. The return only has those fields contained in newFts

type GDatum ¶

type GDatum struct {
	FT      *FType  // FT stores the details of the field: it's role, # categories, mappings
	Summary Summary // Summary of the Data (e.g. distribution)
	Data    any     // Data. This will be either []float64 (FRCts, FROneHot, FREmbed) or []int32 (FRCat)
}

func (*GDatum) Describe ¶

func (g *GDatum) Describe(topK int) string

Describe returns summary statistics. topK is # of values to return for discrete fields

func (*GDatum) String ¶

func (g *GDatum) String() string

type Layer ¶

type Layer int

Layer types

const (
	Input Layer = 0 + iota
	FC
	DropOut
	Target
)

func (Layer) String ¶

func (i Layer) String() string

type Levels ¶

type Levels map[any]int32

Levels is a map from underlying values if a discrete tensor to int32 values

func ByCounts ¶

func ByCounts(data *Raw, sl Slicer) Levels

ByCounts builds a Levels map with the distribution of data

func ByPtr ¶

func ByPtr(data *Raw) Levels

ByPtr returns a mapping of values of data to []int32 for modeling. The values of data are sorted, so the smallest will have a mapped value of 0.

func (Levels) FindValue ¶

func (l Levels) FindValue(val int32) any

FindValue returns key that maps to val

func (Levels) Sort ¶

func (l Levels) Sort(byName, ascend bool) (key []any, val []int32)

Sort sorts Levels, returns sorted map as key, val slices

func (Levels) TopK ¶

func (l Levels) TopK(topNum int, byName, ascend bool) string

TopK returns the top k values either by name or by counts, ascending or descending

type ModSpec ¶

type ModSpec []string

ModSpec holds layers--each slice element is a layer

func LoadModSpec ¶

func LoadModSpec(fileName string) (ms ModSpec, err error)

LoadModSpec loads a ModSpec from file

func (ModSpec) Check ¶

func (m ModSpec) Check() error

Check checks that the layer name is valid

func (ModSpec) DropOut ¶

func (m ModSpec) DropOut(loc int) *DOLayer

DropOut returns the *DoLayer for layer i, if it is of type DropOut. Returns nil o.w.

func (ModSpec) FC ¶

func (m ModSpec) FC(loc int) *FCLayer

FC returns the *FCLayer for layer i, if it is of type FC. Returns nil o.w.

func (ModSpec) Inputs ¶

func (m ModSpec) Inputs(p Pipeline) (FTypes, error)

Inputs returns the FTypes of the input features

func (ModSpec) LType ¶

func (m ModSpec) LType(i int) (*Layer, error)

LType returns the layer type of layer i

func (ModSpec) Save ¶

func (m ModSpec) Save(fileName string) (err error)

Save ModSpec

func (ModSpec) Target ¶

func (m ModSpec) Target(p Pipeline) (*FType, error)

Target returns the *FType of the target

func (ModSpec) TargetName ¶ added in v0.0.29

func (m ModSpec) TargetName() string

type NNModel ¶

type NNModel struct {
	// contains filtered or unexported fields
}

NNModel structure

func LoadNN ¶

func LoadNN(fileRoot string, p Pipeline, build bool) (nn *NNModel, err error)

LoadNN restores a previously saved NNModel. fileRoot is the root name of the save file. p is the Pipeline with the field specs. if build is true, DropOut layers are included.

func NewNNModel ¶

func NewNNModel(modSpec ModSpec, pipe Pipeline, build bool, nnOpts ...NNOpts) (*NNModel, error)

NewNNModel creates a new NN model. Specs for fields in modSpec are pulled from pipe. if build is true, DropOut layers are included.

func PredictNN ¶

func PredictNN(fileRoot string, pipe Pipeline, build bool, opts ...NNOpts) (nn *NNModel, err error)

PredictNN reads in a NNModel from a file and populates it with a batch from p. Methods such as FitSlice and ObsSlice are immediately available.

Example ¶

// This example demonstrates fitting a regression model and predicting on new data
Verbose = false
bSize := 100
// generate a Pipeline of type *ChData that reads test.csv in the data directory
mPipe := chPipe(bSize, "test1.csv")
vPipe := chPipe(1000, "testVal.csv")

// This model is OLS
mod := ModSpec{
	"Input(x1+x2+x3+x4)",
	"FC(size:1)",
	"Target(ycts)",
}
// model is straight-forward with no hidden layers or dropouts.
nn, e := NewNNModel(mod, mPipe, true, WithCostFn(RMS))

if e != nil {
	panic(e)
}

epochs := 150
ft := NewFit(nn, epochs, mPipe)
e = ft.Do()

if e != nil {
	panic(e)
}

sf := os.TempDir() + "/nnTest"
e = nn.Save(sf)

if e != nil {
	panic(e)
}

pred, e := PredictNN(sf, vPipe, false)

if e != nil {
	panic(e)
}

fmt.Printf("out-of-sample correlation: %0.2f\n", stat.Correlation(pred.FitSlice(), pred.ObsSlice(), nil))

_ = os.Remove(sf + "P.nn")

if e != nil {
	panic(e)
}

_ = os.Remove(sf + "S.nn")

Output:

out-of-sample correlation: 0.84

func PredictNNwFts ¶ added in v0.0.11

func PredictNNwFts(fileRoot string, pipe Pipeline, build bool, fts FTypes, opts ...NNOpts) (nn *NNModel, err error)

PredictNNwFts creates a new Pipeline that updates the input pipe to have the FTypes specified by fts. For instance, if one has normalized a continuous input, the normalization factor used in the NN must be the same as its build values. One should save the FTypes from the model build pass them here.

func (*NNModel) Cols ¶ added in v0.0.19

func (m *NNModel) Cols() int

Cols returns # of columns in NNModel output

func (*NNModel) Cost ¶

func (m *NNModel) Cost() *G.Node

Cost returns cost node

func (*NNModel) CostFlt ¶

func (m *NNModel) CostFlt() float64

CostFlt returns the value of the cost node

func (*NNModel) CostFn ¶

func (m *NNModel) CostFn() CostFunc

CostFn returns cost function

func (*NNModel) Features ¶

func (m *NNModel) Features() G.Nodes

Features returns the model input features (continuous+embedded)

func (*NNModel) FitSlice ¶

func (m *NNModel) FitSlice() []float64

FitSlice returns fitted values as a slice

func (*NNModel) Fitted ¶

func (m *NNModel) Fitted() G.Result

Fitted returns fitted values as a G.Result

func (*NNModel) Fwd ¶

func (m *NNModel) Fwd()

Fwd builds forward pass

func (*NNModel) G ¶

func (m *NNModel) G() *G.ExprGraph

G returns model graph

func (*NNModel) InputFT ¶

func (m *NNModel) InputFT() FTypes

func (*NNModel) Inputs ¶

func (m *NNModel) Inputs() G.Nodes

Inputs returns input (continuous+embedded+observed) inputs

func (*NNModel) ModSpec ¶ added in v0.0.24

func (m *NNModel) ModSpec() ModSpec

ModSpec returns the ModSpec for the model

func (*NNModel) Name ¶

func (m *NNModel) Name() string

Name returns model name

func (*NNModel) Obs ¶

func (m *NNModel) Obs() *G.Node

Obs returns the target value as a node

func (*NNModel) ObsSlice ¶

func (m *NNModel) ObsSlice() []float64

ObsSlice returns target values as a slice

func (*NNModel) Opts ¶ added in v0.0.24

func (m *NNModel) Opts() []NNOpts

Opts returns user-input With options

func (*NNModel) OutputCols ¶ added in v0.0.9

func (m *NNModel) OutputCols() int

OutputCols returns the number of columns in the output

func (*NNModel) Params ¶

func (m *NNModel) Params() G.Nodes

Params retursn the model parameter nodes (weights, biases, embeddings)

func (*NNModel) Save ¶

func (m *NNModel) Save(fileRoot string) (err error)

Save saves a model to disk. Two files are created: <fileRoot>S.nn for the ModSpec and <fileRoot>P.nn form the parameters.

func (*NNModel) String ¶

func (m *NNModel) String() string

type NNOpts ¶

type NNOpts func(model1 *NNModel)

NNOpts -- NNModel options

func WithCostFn ¶

func WithCostFn(cf CostFunc) NNOpts

WithCostFn adds a cost function

func WithName ¶

func WithName(name string) NNOpts

WithName adds a name to the NNModel

type Opts ¶

type Opts func(c Pipeline)

Opts function sets an option to a Pipeline

func WithBatchSize ¶

func WithBatchSize(bsize int) Opts

WithBatchSize sets the batch size for the pipeline

func WithCallBack ¶

func WithCallBack(cb Opts) Opts

WithCallBack sets a callback function.

Example ¶

// This example shows how to create a callback during the fitting phase (fit.Do).
// The callback is called at the end of each epoch.  The callback below loads a new dataset after
// epoch 100.

Verbose = false
bSize := 100
// generate a Pipeline of type *ChData that reads test.csv in the data directory
mPipe := chPipe(bSize, "test1.csv")
// This callback function replaces the initial dataset with newData.csv after epoch 2500
cb := func(c Pipeline) {
	switch d := c.(type) {
	case *ChData:
		if d.Epoch(-1) == 100 {
			dataPath := os.Getenv("data") // path to data directory
			fileName := dataPath + "/testVal.csv"
			f, e := os.Open(fileName)
			if e != nil {
				panic(e)
			}
			rdrx := file.NewReader(fileName, ',', '\n', 0, 0, 1, 0, f, 0)
			if e := rdrx.Init("", chutils.MergeTree); e != nil {
				panic(e)
			}
			if e := rdrx.TableSpec().Impute(rdrx, 0, .99); e != nil {
				panic(e)
			}
			rows, _ := rdrx.CountLines()
			fmt.Println("New data at end of epoch ", d.Epoch(-1))
			fmt.Println("Number of rows ", rows)
			WithReader(rdrx)(d)
		}
	}
}

WithCallBack(cb)(mPipe)

// This model is OLS
mod := ModSpec{
	"Input(x1+x2+x3+x4)",
	"FC(size:1)",
	"Target(ycts)",
}
// model is straight-forward with no hidden layers or dropouts.
nn, e := NewNNModel(mod, mPipe, true, WithCostFn(RMS))

if e != nil {
	panic(e)
}

epochs := 150
ft := NewFit(nn, epochs, mPipe)
e = ft.Do()

if e != nil {
	panic(e)
}

Output:

New data at end of epoch  100
Number of rows  1000

func WithCats ¶

func WithCats(names ...string) Opts

WithCats specifies a list of categorical features.

func WithCycle ¶

func WithCycle(cycle bool) Opts

WithCycle sets the cycle bool. If false, the intent is for the Pipeline to generate a new data set is generated for each epoch.

func WithFtypes ¶

func WithFtypes(fts FTypes) Opts

WithFtypes sets the FTypes of the Pipeline. The feature is used to override the default levels.

func WithNormalized ¶

func WithNormalized(names ...string) Opts

WithNormalized sets the features to be normalized.

func WithOneHot ¶

func WithOneHot(name, from string) Opts

WithOneHot adds a one-hot field "name" based of field "from"

Example ¶

// This example shows a model that incorporates a feature (x4) as one-hot and an embedding
Verbose = false
bSize := 100
// generate a Pipeline of type *ChData that reads test.csv in the data directory
pipe := chPipe(bSize, "test1.csv")
// The feature x4 takes on values 0,1,2,...19.  chPipe treats this a continuous feature.
// Let's override that and re-initialize the pipeline.
WithCats("x4")(pipe)
WithOneHot("x4oh", "x4")(pipe)

if e := pipe.Init(); e != nil {
	panic(e)
}
mod := ModSpec{
	"Input(x1+x2+x3+x4oh)",
	"FC(size:2, activation:softmax)",
	"Target(yoh)",
}
//
fmt.Println("x4 as one-hot")
nn, e := NewNNModel(mod, pipe, true)
if e != nil {
	panic(e)
}
fmt.Println(nn)
fmt.Println("x4 as embedding")
mod = ModSpec{
	"Input(x1+x2+x3+E(x4oh,3))",
	"FC(size:2, activation:softmax)",
	"Target(yoh)",
}
nn, e = NewNNModel(mod, pipe, true)
if e != nil {
	panic(e)
}

fmt.Println(nn)

Output:

x4 as one-hot

Inputs
Field x1
	continuous

Field x2
	continuous

Field x3
	continuous

Field x4oh
	one-hot
	derived from feature x4
	length 20

Target
Field yoh
	one-hot
	derived from feature y
	length 2

Model Structure
Input(x1+x2+x3+x4oh)
FC(size:2, activation:softmax)
Target(yoh)

Batch size: 100
24 FC parameters
0 Embedding parameters

x4 as embedding

Inputs
Field x1
	continuous

Field x2
	continuous

Field x3
	continuous

Field x4oh
	embedding
	derived from feature x4
	length 20
	embedding dimension of 3

Target
Field yoh
	one-hot
	derived from feature y
	length 2

Model Structure
Input(x1+x2+x3+E(x4oh,3))
FC(size:2, activation:softmax)
Target(yoh)

Batch size: 100
7 FC parameters
60 Embedding parameters

Example (Example2) ¶

// This example incorporates a drop out layer
Verbose = false
bSize := 100
// generate a Pipeline of type *ChData that reads test.csv in the data directory
pipe := chPipe(bSize, "test1.csv")
// generate model: target and features.  Target yoh is one-hot with 2 levels
mod := ModSpec{
	"Input(x1+x2+x3+x4)",
	"FC(size:3, activation:relu)",
	"DropOut(.1)",
	"FC(size:2, activation:softmax)",
	"Target(yoh)",
}

nn, e := NewNNModel(mod, pipe, true,
	WithCostFn(CrossEntropy),
	WithName("Example With Dropouts"))

if e != nil {
	panic(e)
}
fmt.Println(nn)

Output:

Example With Dropouts
Inputs
Field x1
	continuous

Field x2
	continuous

Field x3
	continuous

Field x4
	continuous

Target
Field yoh
	one-hot
	derived from feature y
	length 2

Model Structure
Input(x1+x2+x3+x4)
FC(size:3, activation:relu)
DropOut(.1)
FC(size:2, activation:softmax)
Target(yoh)

Cost function: CrossEntropy

Batch size: 100
19 FC parameters
0 Embedding parameters

func WithReader ¶

func WithReader(rdr any) Opts

WithReader adds a reader.

type Pipeline ¶

type Pipeline interface {
	Init() error                            // initialize the pipeline
	Rows() int                              // # of observations in the pipeline (size of the epoch)
	Batch(inputs G.Nodes) bool              // puts the next batch in the input nodes
	Epoch(setTo int) int                    // manage epoch count
	IsNormalized(field string) bool         // true if feature is normalized
	IsCat(field string) bool                // true if feature is one-hot encoded
	Cols(field string) int                  // # of columns in the feature
	IsCts(field string) bool                // true if the feature is continuous
	GetFType(field string) *FType           // Get FType for the feature
	GetFTypes() FTypes                      // Get Ftypes for pipeline
	BatchSize() int                         // batch size
	FieldList() []string                    // fields available
	GData() *GData                          // return underlying GData
	Get(field string) *GDatum               // return data for field
	Slice(sl Slicer) (Pipeline, error)      // slice the pipeline
	Shuffle()                               // shuffle data
	Describe(field string, topK int) string // describes a field
}

The Pipeline interface specifies the methods required to be a data Pipeline. The Pipeline is the middleware between the data and the fitting routines.

type PlotDef ¶

type PlotDef struct {
	Show     bool    // Show - true = show graph in browser
	Title    string  // Title - plot title
	XTitle   string  // XTitle - x-axis title
	YTitle   string  // Ytitle - y-axis title
	STitle   string  // STitle - sub-title (under the x-axis)
	Legend   bool    // Legend - true = show legend
	Height   float64 // Height - height of graph, in pixels
	Width    float64 // Width - width of graph, in pixels
	FileName string  // FileName - output file for graph (in html)
}

PlotDef specifies Plotly Layout features I commonly use.

type Raw ¶

type Raw struct {
	Kind reflect.Kind // type of elements of Data
	Data []any
}

Raw holds a raw slice of type Kind

func AllocRaw ¶

func AllocRaw(n int, kind reflect.Kind) *Raw

AllocRaw creates an empty slice of type kind and len n

func NewRaw ¶

func NewRaw(x []any, sl Slicer) *Raw

NewRaw creates a new raw slice from x. This assumes all elements of x are the same Kind

func NewRawCast ¶

func NewRawCast(x any, sl Slicer) *Raw

func (*Raw) Len ¶

func (r *Raw) Len() int

func (*Raw) Less ¶

func (r *Raw) Less(i, j int) bool

func (*Raw) Swap ¶

func (r *Raw) Swap(i, j int)

type SeaError ¶

type SeaError int

const (
	ErrPipe SeaError = 0 + iota
	ErrData
	ErrFields
	ErrGData
	ErrChData
	ErrModSpec
	ErrNNModel
	ErrDiags
	ErrVecData
)

func (SeaError) Error ¶

func (seaErr SeaError) Error() string

type Slice ¶

type Slice struct {
	// contains filtered or unexported fields
}

Slice implements generating Slicer functions for a feature. These are used to slice through the values of a discrete feature. For continuous features, it slices by quartile.

func NewSlice ¶

func NewSlice(feat string, minCnt int, pipe Pipeline, restrict []any) (*Slice, error)

NewSlice makes a new Slice based on feat in Pipeline pipe. minCnt is the minimum # of obs a slice must have to be used. Restrict is a slice of values to restrict Iter to.

func (*Slice) Index ¶

func (s *Slice) Index() int32

Index returns the mapped value of the current value

func (*Slice) Iter ¶

func (s *Slice) Iter() bool

Iter iterates through the levels (ranges) of the feature. Returns false when done.

Example ¶

// An example of slicing through the data to generate diagnostics on subsets.
// The code here will generate a decile plot for each of the 20 levels of x4.
Verbose = false
bSize := 100
// generate a Pipeline of type *ChData that reads test.csv in the data directory
pipe := chPipe(bSize, "test1.csv")
// The feature x4 takes on values 0,1,2,...19.  chPipe treats this a continuous feature.
// Let's override that and re-initialize the pipeline.

WithCats("x4")(pipe)
WithOneHot("x4oh", "x4")(pipe)

if e := pipe.Init(); e != nil {
	panic(e)
}

mod := ModSpec{
	"Input(x1+x2+x3+x4oh)",
	"FC(size:2, activation:softmax)",
	"Target(yoh)",
}
nn, e := NewNNModel(mod, pipe, true)

if e != nil {
	panic(e)
}
WithCostFn(CrossEntropy)(nn)

ft := NewFit(nn, 100, pipe)

if e = ft.Do(); e != nil {
	panic(e)
}

sf := os.TempDir() + "/nnTest"
e = nn.Save(sf)

if e != nil {
	panic(e)
}

WithBatchSize(8500)(pipe)

pred, e := PredictNN(sf, pipe, false)
if e != nil {
	panic(e)
}

if e = AddFitted(pipe, sf, []int{1}, "fit", nil, false, nil); e != nil {
	panic(e)
}

_ = os.Remove(sf + "P.nn")
_ = os.Remove(sf + "S.nn")

s, e := NewSlice("x4", 0, pipe, nil)
if e != nil {
	panic(e)
}

fit, e := Coalesce(pred.FitSlice(), 2, []int{1}, false, false, nil)
if e != nil {
	panic(e)
}
desc, e := NewDesc(nil, "Descriptive Statistics")

for s.Iter() {
	slicer := s.MakeSlicer()
	if e != nil {
		panic(e)
	}
	desc.Populate(fit, true, slicer)
	fmt.Printf("Slice x4=%v has %d observations\n", s.Value(), desc.N)
}

Output:

Slice x4=0 has 391 observations
Slice x4=1 has 408 observations
Slice x4=2 has 436 observations
Slice x4=3 has 428 observations
Slice x4=4 has 417 observations
Slice x4=5 has 472 observations
Slice x4=6 has 424 observations
Slice x4=7 has 455 observations
Slice x4=8 has 431 observations
Slice x4=9 has 442 observations
Slice x4=10 has 411 observations
Slice x4=11 has 413 observations
Slice x4=12 has 433 observations
Slice x4=13 has 416 observations
Slice x4=14 has 434 observations
Slice x4=15 has 367 observations
Slice x4=16 has 437 observations
Slice x4=17 has 433 observations
Slice x4=18 has 429 observations
Slice x4=19 has 423 observations

func (*Slice) MakeSlicer ¶

func (s *Slice) MakeSlicer() Slicer

MakeSlicer makes a Slicer function for the current value (discrete) or range (continuous) of the feature. Continuous features are sliced at the lower quartile, median and upper quartile, producing 4 slices.

func (*Slice) Title ¶

func (s *Slice) Title() string

Title retrieves the auto-generated title

func (*Slice) Value ¶

func (s *Slice) Value() any

Value returns the level of a discrete feature we're working on

type Slicer ¶

type Slicer func(row int) bool

Slicer is an optional function that returns true if the row is to be used in calculations. This is used to subset the diagnostics to specific values.

func SlicerAnd ¶

func SlicerAnd(s1, s2 Slicer) Slicer

SlicerAnd creates a Slicer that is s1 && s2

func SlicerOr ¶

func SlicerOr(s1, s2 Slicer) Slicer

SlicerOr creates a Slicer that is s1 || s2

type Summary ¶

type Summary struct {
	NRows  int    // size of the data
	DistrC *Desc  // summary of continuous field
	DistrD Levels // summary of discrete field
}

Summary has descriptive statistics of a field using its current data.

type VecData ¶

type VecData struct {
	// contains filtered or unexported fields
}

func NewVecData ¶

func NewVecData(name string, data *GData, opts ...Opts) *VecData

func (*VecData) Batch ¶

func (vec *VecData) Batch(inputs G.Nodes) bool

func (*VecData) BatchSize ¶

func (vec *VecData) BatchSize() int

BatchSize returns Pipeline batch size

func (*VecData) Cols ¶

func (vec *VecData) Cols(field string) int

Cols returns the # of columns in the field

func (*VecData) Describe ¶

func (vec *VecData) Describe(field string, topK int) string

Describe describes a field. If the field has role FRCat, the top k values (by frequency) are returned.

func (*VecData) Epoch ¶

func (vec *VecData) Epoch(setTo int) int

Epoch sets the epoch to setTo if setTo >=0 and returns epoch #.

func (*VecData) FieldList ¶

func (vec *VecData) FieldList() []string

FieldList returns a slice of field names in the Pipeline

func (*VecData) GData ¶

func (vec *VecData) GData() *GData

GData returns the Pipelines' GData

func (*VecData) Get ¶

func (vec *VecData) Get(field string) *GDatum

Get returns a fields's GDatum

func (*VecData) GetFType ¶

func (vec *VecData) GetFType(field string) *FType

GetFType returns the fields FType

func (*VecData) GetFTypes ¶

func (vec *VecData) GetFTypes() FTypes

GetFTypes returns FTypes for vec Pipeline.

func (*VecData) Init ¶

func (vec *VecData) Init() error

func (*VecData) IsCat ¶

func (vec *VecData) IsCat(field string) bool

IsCat returns true if field has role FRCat.

func (*VecData) IsCts ¶

func (vec *VecData) IsCts(field string) bool

IsCts returns true if the field has role FRCts.

func (*VecData) IsNormalized ¶

func (vec *VecData) IsNormalized(field string) bool

IsNormalized returns true if the field is normalized.

func (*VecData) IsSorted ¶

func (vec *VecData) IsSorted() bool

IsSorted returns true if the data has been sorted.

func (*VecData) Name ¶

func (vec *VecData) Name() string

Name returns Pipeline name

func (*VecData) Rows ¶

func (vec *VecData) Rows() int

Rows is # of rows of data in the Pipeline

func (*VecData) SaveFTypes ¶

func (vec *VecData) SaveFTypes(fileName string) error

SaveFTypes saves the FTypes for the Pipeline.

func (*VecData) Shuffle ¶

func (vec *VecData) Shuffle()

Shuffle shuffles the data.

func (*VecData) Slice ¶

func (vec *VecData) Slice(sl Slicer) (Pipeline, error)

func (*VecData) Sort ¶

func (vec *VecData) Sort(field string, ascending bool) error

Sort sorts the data on "field".

func (*VecData) SortField ¶

func (vec *VecData) SortField() string

SortField returns the name of the sort field.

func (*VecData) String ¶

func (vec *VecData) String() string

type XY ¶

type XY struct {
	X []float64
	Y []float64
}

XY struct holds (x,y) pairs as distinct slices

func NewXY ¶

func NewXY(x, y []float64) (*XY, error)

NewXY creates a pointer to a new XY with error checking

func (*XY) Interp ¶

func (p *XY) Interp(xNew []float64) (*XY, error)

Interp linearly interpolates XY at the points xNew.

func (*XY) Len ¶

func (p *XY) Len() int

func (*XY) Less ¶

func (p *XY) Less(i, j int) bool

func (*XY) Plot ¶

func (p *XY) Plot(pd *PlotDef, scatter bool) error

Plot produces an XY Plotly plot

func (*XY) Sort ¶

func (p *XY) Sort() error

Sort sorts with error checking

func (*XY) String ¶

func (p *XY) String() string

func (*XY) Swap ¶

func (p *XY) Swap(i, j int)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL