gogafit

command module
v0.0.0-...-1a937e6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2022 License: MIT Imports: 1 Imported by: 0

README

gogafit

Fitting and feature selection based on genetic algorithm.

CLI Commands

gogafit is a tool for training and selecting features for linear models.
Feature selection is done by minimizing diferent cost functions (AICC, BIC, EBIC).
Furthermore, there are a variety of additional tools for common tasks associated
with model selection. Data is represented by simple comma separated files, where
the first line is a header that gives a name to each column.

Usage:
  gogafit [command]

Available Commands:
  elm         Create an extreme learning machine network
  fit         Fit data
  help        Help about any command
  hook        Generate templates scripts for hooks
  plot        Plot the fit in a scatter plot
  poly        Add polynomial versions of a subset of the columns
  pred        Command for predicting from a GA model
  rmse        Calculate RMSE for a model
  ttsplit     Split a dataset in a train and test set

Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)
  -h, --help            help for gogafit
  -t, --toggle          Help message for toggle

Use "gogafit [command] --help" for more information about a command.

Fit command

Fit linear model from a datafile. Features are selected via a genetic algorithm
by minimizing the given cost function. The data should be organized in a csv file where the
first line is a header which assigns a name to each feature.

Example file
feat1, feat2, feat3
0.1, 0.5, -0.2
0.5, 0.1, 1.0

the program fits a coefficient vector c such that Xc = y. The y vector is called the target vector.
It is extracted from the target parameter passed (e.g. y is the column in the file whose name is
<target>). The remaining columns are taken as the X matrix.

Minimal example:

gogafit fit -d myfile.csv -t feat3

will take the columns corresponding to feat1 and feat2 as the X matrix, and use the last column
(here named feat3) as the y vector.

Usage:
  gogafit fit [flags]

Flags:
  -c, --cost string     Cost function (aic|aicc|bic|ebic) (default "aicc")
  -s, --csplits uint    Number of splits used for cross over operations (default 2)
  -d, --data string     Datafile. Should be stored in CSV format
  -f, --fdratio float   Maximum ratio between number of selected features and number of data points (default 0.8)
  -h, --help            help for fit
  -i, --iprob float     Probability of activating a feature in the initial pool of genomes (default 0.5)
  -r, --lograte uint    Number generation between each log and backup of best solution (default 100)
  -m, --mutrate float   Mutation rate in genetic algorithm (default 0.5)
  -g, --numgen uint     Number of generations to run (default 100)
  -o, --out string      File where the result of the best model is placed (default "model.json")
  -p, --popsize uint    Population size (default 30)
  -y, --target string   Name of the column used as target in the fit (default "lastCol")
  -t, --type string     Fit-type: regression (reg) or classify (cls) (default "reg")

Global Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)

TTsplit command

Splits the passed csv file into a training set and a validation set.
If the data file is called mydata.csv, the program will create two file

mydata_train.csv for the training data
mydate_test.csv for the test/validation data

Usage:
  gogafit ttsplit [flags]

Flags:
  -d, --data string      Dataset with data
  -f, --fraction float   Fraction of the data placed in the test set (default 0.2)
  -h, --help             help for ttsplit

Global Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)

RMSE command

Calcualte the root mean square error for a given model.

The prediction is given by p = X.dot(c) where X is the design matrix and y is
c is the coefficient vector. RMSE is given by sqrt(mean((p - y)**2)).

The X and y vector is extracted from a csv file with the format (mydata.csv in the example below)

feat1, feat2, feat3
0.1, 0.2, 0.5
-0.4, 0.2, 0.5

the column passed as target is used as the y-vector and the remaining columns are used as the X
matrix.

The coefficient vector is extracted from a csv file of the form (mycoeff.json in the example below)

  {
	"TargetName": "feat3",
	"Datafile": "gafit/_testdata/dataset.csv",
	"Coeffs": {
	  "Var1": 2.9999990000004804,
	  "Var2": 1.0000003999997198
	},
	"Score": {
	  "Name": "aicc",
	  "Value": -25.7622420881808
	}
  }

where the first column is a name (that must match one of header fields in the data csvfile) and
the second column is the value of the coefficients. Coefficients corresponding to columns in the
data matrix that are not listed, is taken as zero.

Minimal example:

gogafit rmse -d mydata.csv -c mycoeff,csv

Usage:
  gogafit rmse [flags]

Flags:
  -d, --data string    Csv file with data
  -h, --help           help for rmse
  -m, --model string   JSON file with fitted model coefficients (default "model.json")

Global Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)

Plot command

Create a scatter plot of the predictions of one or multiple datasets.
If we have the training data in a file called train.csv and validation data in a file
validate.csv. Our trained model is stored in model.json, it can be plotted by

gogafit plot -d train.csv,validate.csv -m model.json -o plot.png

Usage:
  gogafit plot [flags]

Flags:
  -d, --data string    Comma separated list of datasets (e.g. test, train
  -h, --help           help for plot
  -m, --model string   JSON file with the model
  -o, --out string     Image file where the model will be stored (default "gogafitPlot.png")

Global Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)

Poly command

Adds columns representing a polynomial version of a subset of the columns.
	
Example:
We have the following csv file named data.csv

feat1,feat2,targetQuantity
1.0,2.0,3.0
2.0,1.0,2.0

and we wich to add a new feature which is feat1^2. Run

gogafit poly -d data.csv -t targetQuantity -o 2 -p feat1

this will create a file data_poly.csv

feat1,feat2,feat1p2targetQuantity
1.0,2.0,1.0,3.0
2.0,1.0,4.0,2.0

Usage:
  gogafit poly [flags]

Flags:
  -d, --data string      Original datafile
  -h, --help             help for poly
  -o, --order uint       Polynomial order (default 1)
  -p, --pattern string   Polynomial versions of all features containing this substring will be added
  -y, --target string    Name of the quantity used as target property

Global Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)

ELM command

Calculates the output of a hiddan layer in an extreme learning machine.
The hidden layer consists of the given number of activation functions.

gogafit elm -d mydata.csv -r 100 -s 200 -t feat3 

creates an ELM with 100 relu neurons and 200 sigmoid neurons. Similar to the other commands,
the format of the data file is

feat1, feat2, feat3
0.2, 0.1, 0.4
...

where the name of the column corresponding to the target feature is specified via the -y flag.

Usage:
  gogafit elm [flags]

Flags:
  -d, --data string     Datafile with inputs for the input layer
  -h, --help            help for elm
  -r, --relu uint       Number of rectifier activation functions in the hidden layer (default 1)
  -s, --sig uint        Number of sigmoid activation functions in the hidden layer (default 1)
  -y, --target string   Name of columns that represent the target values

Global Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)

Hook command

This command generates templates for hooks

Usage:
  gogafit hook [flags]

Flags:
  -h, --help            help for hook
  -o, --out string      output file (default "cost.py")
  -p, --pyexec string   name of the python executable (default "python")
  -t, --type string     Template type (currently only cost supported) (default "cost")

Global Flags:
      --config string   config file (default is $HOME/.gogafit.yaml)

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL