example

package
v0.0.0-...-7d47eef Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 12, 2023 License: Apache-2.0, MIT Imports: 4 Imported by: 0

Documentation

Overview

Package example defines "Batch": a batch of examples; and "Features": the specification of the input features of a model.

Index

Constants

View Source
const OutOfVabulary = uint32(0)

OutOfVabulary (OOV) is the special values of unknown or too-rare categorical values.

Variables

This section is empty.

Functions

func GetColumn

func GetColumn(name string, dataspec *dataspec_pb.DataSpecification) *dataspec_pb.Column

GetColumn gets the column spec from its name.

func NewFeatures

NewFeatures converts a dataspec into a feature definition used by an engine.

Types

type Batch

type Batch struct {

	// {Example major, feature minor} values for the unary feature values.
	NumericalValues   []float32
	CategoricalValues []uint32
	// contains filtered or unexported fields
}

Batch is a set of examples.

func NewBatch

func NewBatch(numExamples int, features *Features) *Batch

NewBatch creates a batch of examples. The example values are in a non-defined state: Because being used, the features values should be set ether with "FillMissing" or "Set*".

func (*Batch) Clear

func (batch *Batch) Clear()

Clear clears the content of a batch. After a clear call, the feature values are in a non defined state i.e. in the same state as after "NewBatch".

func (*Batch) CopyFrom

func (batch *Batch) CopyFrom(src *Batch, beginIdx int, endIdx int)

CopyFrom copies the content of a batch from another batch. Assumes both source batch has the exact same features (e.g. it is created by the same engine).

func (*Batch) FillMissing

func (batch *Batch) FillMissing()

FillMissing sets all the feature values of all the examples as missing.

This method is equivalent to, but more efficient than, calling the "SetMissing*" methods for all the features and all the examples.

func (*Batch) NumAllocatedExamples

func (batch *Batch) NumAllocatedExamples() int

NumAllocatedExamples is the number of allocated examples.

func (*Batch) SetCategorical

func (batch *Batch) SetCategorical(exampleIdx int, feature CategoricalFeatureID, value uint32)

SetCategorical sets the value of a categorical feature as an integer.

func (*Batch) SetCategoricalFromString

func (batch *Batch) SetCategoricalFromString(exampleIdx int, feature CategoricalFeatureID, rawValue string) error

SetCategoricalFromString sets the value of a categorical feature.

func (*Batch) SetFromFields

func (batch *Batch) SetFromFields(exampleIdx int, header []string, values []string) error

SetFromFields sets all the fields of an example from a csv-like field and header. This method is slow and should not be used for speed-sensitive code.

Empty field and fields with the value "NA" are considered "missing values".

Example:

examples.SetFromFields(0, ["a","b","c"], ["0.5","UK","NA"])

func (*Batch) SetMissingCategorical

func (batch *Batch) SetMissingCategorical(exampleIdx int, feature CategoricalFeatureID)

SetMissingCategorical sets a categorical feature value as missing.

func (*Batch) SetMissingNumerical

func (batch *Batch) SetMissingNumerical(exampleIdx int, feature NumericalFeatureID)

SetMissingNumerical sets a numerical feature value as missing.

func (*Batch) SetNumerical

func (batch *Batch) SetNumerical(exampleIdx int, feature NumericalFeatureID, value float32)

SetNumerical sets the value of a numerical feature.

func (*Batch) ToStringDebug

func (batch *Batch) ToStringDebug() string

ToStringDebug exports the content of the set of examples into a text-debug representation.

type CategoricalFeatureID

type CategoricalFeatureID int

CategoricalFeatureID is the unique identifier of a categorical feature.

type CategoricalSpec

type CategoricalSpec struct {
	// NumUniqueValues of this feature. The feature value should be in [0, NumUniqueValues).
	NumUniqueValues uint32
	// contains filtered or unexported fields
}

CategoricalSpec is the meta-data about a categorical feature.

type CompatibilityType

type CompatibilityType int32

CompatibilityType indicates how the model was trained, and it affects how features are consumed.

const (
	// CompatibilityYggdrasil is the native way to consume examples and models model with Yggdrasil
	// Decision Forests.
	CompatibilityYggdrasil CompatibilityType = 0

	// CompatibilityTensorFlowDecisionForests consumes models trained with TensorFlow Decision
	// Forests.
	//
	// Compatibility impact: Categorical and categorical-set columns feed as integer are offset by
	// 1. See "CATEGORICAL_INTEGER_OFFSET" in TensorFlow Decision Forests.
	CompatibilityTensorFlowDecisionForests CompatibilityType = 1

	// CompatibilityAutoTFX consumes models trained with TensorFlow Decision
	// Forests.
	//
	// Compatibility impact: Categorical and categorical-set columns feed as integer are offset by
	// 1. See "CATEGORICAL_INTEGER_OFFSET" in TensorFlow Decision Forests. Missing numerical and
	// categorical string values are replaced respectively by -1 and "" (empty string).
	CompatibilityAutoTFX CompatibilityType = 2

	// CompatibilityAutomatic detects automatically the compatibility of the model.
	CompatibilityAutomatic = 3
)

type FeatureConstructionMap

type FeatureConstructionMap struct {
	// Mapping between a column index (i.e. the index of the column in the
	// dataspec) and a NumericalFeatureID.
	NumericalFeatures map[int]NumericalFeatureID

	// Mapping between a column index (in the dataspec) and a
	// CategoricalFeatureID.
	CategoricalFeatures map[int]CategoricalFeatureID
}

FeatureConstructionMap contains the mapping between the column index and the feature id. FeatureConstructionMap is only used during the model to engine compilation, and it is then discarded.

type Features

type Features struct {
	// NumericalFeatures is the mapping between numerical feature names and numerical feature ids.
	// Indexed by "NumericalFeatureID".
	NumericalFeatures map[string]NumericalFeatureID
	// CategoricalFeatures is the mapping between categorical feature names and categorical feature
	// ids. Indexed by "CategoricalFeatureID".
	CategoricalFeatures map[string]CategoricalFeatureID

	// MissingNumericalValues is the representation of a "missing value" for each of the numerial
	// features.
	// Note: Currently, serving only support global imputation of missing values
	// during inference.
	MissingNumericalValues []float32
	// MissingCategoricalValues is the representation of a "missing value" for each of the categorical
	// features.NumericalFeatureID
	MissingCategoricalValues []uint32

	// CategoricalSpec is the meta-data about the categorical features. Indexed by
	// "CategoricalFeatureID".
	CategoricalSpec []CategoricalSpec

	// Compatibility indicates how the model is served.
	Compatibility CompatibilityType
}

Features contains the definition of the input features of a model.

func (*Features) NumFeatures

func (f *Features) NumFeatures() int

NumFeatures is the number of features.

func (*Features) OverrideMissingValuePlaceholders

func (f *Features) OverrideMissingValuePlaceholders(numerical float32, categorical string)

OverrideMissingValuePlaceholders specifies the values that will replace the missing numerical and categorical values when calling SetMissing* during inference.

Models are natively able to handle missing values. Overriding the missing values is a form of data pre-processing that should only be applied if such pre-processing is also applied during training.

type NumericalFeatureID

type NumericalFeatureID int

NumericalFeatureID is the unique identifier of a numerical feature.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL