moru

package module
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2023 License: Apache-2.0 Imports: 9 Imported by: 0

README

package moru

Go Report Card godoc

Package moru provides functions to run the models created by goMortgage. Using moru is quite straightforward.

There are two options: ScoreToTable and ScoreToPipe.

With ScoreToTable, the user provides

  • A ClickHouse table that has the features required by the model(s)
  • Pointers to the directories of the models created by goMortgage

The input table, augmented by the model outputs, is saved back to a new ClickHouse table.

With ScoreToPipe, the user provides

  • A seafan.Pipeline with model features
  • Pointers to the directories of the models created by goMortgage.

The model outputs are added to the pipeline.

Documentation

Overview

Package moru provides functions to run the models created by goMortgage.

Using moru is quite straightforward.

There are two options: ScoreToTable and ScoreToPipe.

With ScoreToTable, the user provides

  • A ClickHouse table that has the features required by the model(s)
  • Pointers to the directories of the models created by goMortgage

The input table, augmented by the model outputs, is saved back to a new ClickHouse table.

With ScoreToPipe, the user provides

  • A seafan.Pipeline with model features
  • Pointers to the directories of the models created by goMortgage

The model outputs are added to the pipeline.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GatherFts added in v0.0.2

func GatherFts(models []ModelDef) (ftMods []sea.FTypes, obsFts, cats sea.FTypes, err error)

GatherFts collects the FTypes of the fields in models

  • models: models to collect FTypes for.

Returns:

  • ftMods: the features (as sea.FTypes) in each model, same length as models
  • obsFts: the sea.FTypes of the target for each model. If the target is normalized, then this is needed to unnormalize the model output.
  • cats: list of the cats in all the models. This is needed to make sure all cat features are treated as such. This is not a problem for strings, but ins default to FRCts not FRCat.

func InsertTable

func InsertTable(tableName string, pipe sea.Pipeline, conn *chutils.Connect) error

InsertTable inserts the data in pipe into the ClickHouse table tableName. The table must already exist.

  • tableName: name of table to insert into (table must exist)
  • pipe: pipeline with data to insert
  • conn: ClickHouse connector

func MakeTable

func MakeTable(tableName, orderBy string, pipe sea.Pipeline, conn *chutils.Connect) error

MakeTable makes ClickHouse table tableName based on the fields in pipe. MakeTable overwrites the table if it exists.

  • tableName: name of ClickHouse table to create
  • orderBy: comma-separated values of sourceTable that create a unique key
  • pipe: Pipeline containing fields to create for the table
  • conn: connector to ClickHouse

func NewPipe

func NewPipe(table, orderBy string, models []ModelDef, startRow, batchSize int, conn *chutils.Connect) (sea.Pipeline, error)

NewPipe creates a new data pipeline from "table" and appends the model outputs specified by "models". The pipeline consists of rows startRow to startRow+batchSize-1 of table. The unique key orderBy is needed so that ClickHouse will correctly run through the table over multiple calls to NewPipe.

Arguments:

  • table: name of the ClickHouse table with data to calculate model
  • orderBy: comma-separated field list that produces a unique key
  • models: model location and fields to create
  • startRow: first row of table to pull for the pipeline
  • batchSize: number of rows of table to pull
  • conn: connector to ClickHouse

func Rows

func Rows(tableName string, conn *chutils.Connect) int

Rows returns the number of rows in a ClickHouse table.

  • tableName: name of table to for row count
  • conn: ClickHouse connector

func ScoreToPipe added in v0.0.2

func ScoreToPipe(pipe sea.Pipeline, models []ModelDef, ftMods []sea.FTypes, obsFts sea.FTypes) error

ScoreToPipe adds the model outputs to pipe. Note: the maps for categorical features should not ever use a default value in its sea.FType as this compression can cause the model output to be wrong. The default values will be applied in ScoreToPipe.

  • pipe: pipeline to add model outputs to.
  • models: definitions of models to add
  • ftMods: slice of features for each model. if nil, ScoreToPipe will build this & return it.
  • obsFts: FType of the targets of models.

func ScoreToTable added in v0.0.2

func ScoreToTable(sourceTable, destTable, orderBy string, models []ModelDef, batchSize, nWorker int, conn *chutils.Connect) error

ScoreToTable creates destTable from sourceTable adding fitted values from one or more models.

  • sourceTable: source ClickHouse table with features required by the model
  • destTable: created ClickHouse table with sourceTable fields plus model outputs
  • orderBy: comma-separated values of sourceTable that create a unique key
  • models: model specifications (location, field names and columns)
  • batchsize: number of rows to process as a group
  • nWorker: number of concurrent processes
  • conn: ClickHouse connector

Set sea.Verbose = false to suppress messages during run.

Types

type ModelDef

type ModelDef struct {
	Location     string   // directory with the procyonb model
	FieldNames   []string // slice of field names we're calculating
	FieldColumns [][]int  // columns to sum corresponding to the field names
}

ModelDef defines a model and the fields to calculate from it. The models are run in index order.

func (*ModelDef) Error added in v0.0.4

func (mdef *ModelDef) Error() error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL