Documentation ¶
Overview ¶
Package moru provides functions to run the models created by goMortgage.
Using moru is quite straightforward.
There are two options: ScoreToTable and ScoreToPipe.
With ScoreToTable, the user provides
- A ClickHouse table that has the features required by the model(s)
- Pointers to the directories of the models created by goMortgage
The input table, augmented by the model outputs, is saved back to a new ClickHouse table.
With ScoreToPipe, the user provides
- A seafan.Pipeline with model features
- Pointers to the directories of the models created by goMortgage
The model outputs are added to the pipeline.
Index ¶
- func GatherFts(models []ModelDef) (ftMods []sea.FTypes, obsFts, cats sea.FTypes, err error)
- func InsertTable(tableName string, pipe sea.Pipeline, conn *chutils.Connect) error
- func MakeTable(tableName, orderBy string, pipe sea.Pipeline, conn *chutils.Connect) error
- func NewPipe(table, orderBy string, models []ModelDef, startRow, batchSize int, ...) (sea.Pipeline, error)
- func Rows(tableName string, conn *chutils.Connect) int
- func ScoreToPipe(pipe sea.Pipeline, models []ModelDef, ftMods []sea.FTypes, obsFts sea.FTypes) error
- func ScoreToTable(sourceTable, destTable, orderBy string, models []ModelDef, ...) error
- type ModelDef
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func GatherFts ¶ added in v0.0.2
GatherFts collects the FTypes of the fields in models
- models: models to collect FTypes for.
Returns:
- ftMods: the features (as sea.FTypes) in each model, same length as models
- obsFts: the sea.FTypes of the target for each model. If the target is normalized, then this is needed to unnormalize the model output.
- cats: list of the cats in all the models. This is needed to make sure all cat features are treated as such. This is not a problem for strings, but ins default to FRCts not FRCat.
func InsertTable ¶
InsertTable inserts the data in pipe into the ClickHouse table tableName. The table must already exist.
- tableName: name of table to insert into (table must exist)
- pipe: pipeline with data to insert
- conn: ClickHouse connector
func MakeTable ¶
MakeTable makes ClickHouse table tableName based on the fields in pipe. MakeTable overwrites the table if it exists.
- tableName: name of ClickHouse table to create
- orderBy: comma-separated values of sourceTable that create a unique key
- pipe: Pipeline containing fields to create for the table
- conn: connector to ClickHouse
func NewPipe ¶
func NewPipe(table, orderBy string, models []ModelDef, startRow, batchSize int, conn *chutils.Connect) (sea.Pipeline, error)
NewPipe creates a new data pipeline from "table" and appends the model outputs specified by "models". The pipeline consists of rows startRow to startRow+batchSize-1 of table. The unique key orderBy is needed so that ClickHouse will correctly run through the table over multiple calls to NewPipe.
Arguments:
- table: name of the ClickHouse table with data to calculate model
- orderBy: comma-separated field list that produces a unique key
- models: model location and fields to create
- startRow: first row of table to pull for the pipeline
- batchSize: number of rows of table to pull
- conn: connector to ClickHouse
func Rows ¶
Rows returns the number of rows in a ClickHouse table.
- tableName: name of table to for row count
- conn: ClickHouse connector
func ScoreToPipe ¶ added in v0.0.2
func ScoreToPipe(pipe sea.Pipeline, models []ModelDef, ftMods []sea.FTypes, obsFts sea.FTypes) error
ScoreToPipe adds the model outputs to pipe. Note: the maps for categorical features should not ever use a default value in its sea.FType as this compression can cause the model output to be wrong. The default values will be applied in ScoreToPipe.
- pipe: pipeline to add model outputs to.
- models: definitions of models to add
- ftMods: slice of features for each model. if nil, ScoreToPipe will build this & return it.
- obsFts: FType of the targets of models.
func ScoreToTable ¶ added in v0.0.2
func ScoreToTable(sourceTable, destTable, orderBy string, models []ModelDef, batchSize, nWorker int, conn *chutils.Connect) error
ScoreToTable creates destTable from sourceTable adding fitted values from one or more models.
- sourceTable: source ClickHouse table with features required by the model
- destTable: created ClickHouse table with sourceTable fields plus model outputs
- orderBy: comma-separated values of sourceTable that create a unique key
- models: model specifications (location, field names and columns)
- batchsize: number of rows to process as a group
- nWorker: number of concurrent processes
- conn: ClickHouse connector
Set sea.Verbose = false to suppress messages during run.
Types ¶
type ModelDef ¶
type ModelDef struct { Location string // directory with the procyonb model FieldNames []string // slice of field names we're calculating FieldColumns [][]int // columns to sum corresponding to the field names }
ModelDef defines a model and the fields to calculate from it. The models are run in index order.