Documentation ¶
Overview ¶
Package example defines "Batch": a batch of examples; and "Features": the specification of the input features of a model.
Index ¶
- Constants
- func GetColumn(name string, dataspec *dataspec_pb.DataSpecification) *dataspec_pb.Column
- func NewFeatures(dataspec *dataspec_pb.DataSpecification, header *model_pb.AbstractModel, ...) (*Features, *FeatureConstructionMap, error)
- type Batch
- func (batch *Batch) Clear()
- func (batch *Batch) CopyFrom(src *Batch, beginIdx int, endIdx int)
- func (batch *Batch) FillMissing()
- func (batch *Batch) NumAllocatedExamples() int
- func (batch *Batch) SetCategorical(exampleIdx int, feature CategoricalFeatureID, value uint32)
- func (batch *Batch) SetCategoricalFromString(exampleIdx int, feature CategoricalFeatureID, rawValue string) error
- func (batch *Batch) SetFromFields(exampleIdx int, header []string, values []string) error
- func (batch *Batch) SetMissingCategorical(exampleIdx int, feature CategoricalFeatureID)
- func (batch *Batch) SetMissingNumerical(exampleIdx int, feature NumericalFeatureID)
- func (batch *Batch) SetNumerical(exampleIdx int, feature NumericalFeatureID, value float32)
- func (batch *Batch) ToStringDebug() string
- type CategoricalFeatureID
- type CategoricalSpec
- type CompatibilityType
- type FeatureConstructionMap
- type Features
- type NumericalFeatureID
Constants ¶
const OutOfVabulary = uint32(0)
OutOfVabulary (OOV) is the special values of unknown or too-rare categorical values.
Variables ¶
This section is empty.
Functions ¶
func GetColumn ¶
func GetColumn(name string, dataspec *dataspec_pb.DataSpecification) *dataspec_pb.Column
GetColumn gets the column spec from its name.
func NewFeatures ¶
func NewFeatures(dataspec *dataspec_pb.DataSpecification, header *model_pb.AbstractModel, compatibility CompatibilityType) (*Features, *FeatureConstructionMap, error)
NewFeatures converts a dataspec into a feature definition used by an engine.
Types ¶
type Batch ¶
type Batch struct { // {Example major, feature minor} values for the unary feature values. NumericalValues []float32 CategoricalValues []uint32 // contains filtered or unexported fields }
Batch is a set of examples.
func NewBatch ¶
NewBatch creates a batch of examples. The example values are in a non-defined state: Because being used, the features values should be set ether with "FillMissing" or "Set*".
func (*Batch) Clear ¶
func (batch *Batch) Clear()
Clear clears the content of a batch. After a clear call, the feature values are in a non defined state i.e. in the same state as after "NewBatch".
func (*Batch) CopyFrom ¶
CopyFrom copies the content of a batch from another batch. Assumes both source batch has the exact same features (e.g. it is created by the same engine).
func (*Batch) FillMissing ¶
func (batch *Batch) FillMissing()
FillMissing sets all the feature values of all the examples as missing.
This method is equivalent to, but more efficient than, calling the "SetMissing*" methods for all the features and all the examples.
func (*Batch) NumAllocatedExamples ¶
NumAllocatedExamples is the number of allocated examples.
func (*Batch) SetCategorical ¶
func (batch *Batch) SetCategorical(exampleIdx int, feature CategoricalFeatureID, value uint32)
SetCategorical sets the value of a categorical feature as an integer.
func (*Batch) SetCategoricalFromString ¶
func (batch *Batch) SetCategoricalFromString(exampleIdx int, feature CategoricalFeatureID, rawValue string) error
SetCategoricalFromString sets the value of a categorical feature.
func (*Batch) SetFromFields ¶
SetFromFields sets all the fields of an example from a csv-like field and header. This method is slow and should not be used for speed-sensitive code.
Empty field and fields with the value "NA" are considered "missing values".
Example:
examples.SetFromFields(0, ["a","b","c"], ["0.5","UK","NA"])
func (*Batch) SetMissingCategorical ¶
func (batch *Batch) SetMissingCategorical(exampleIdx int, feature CategoricalFeatureID)
SetMissingCategorical sets a categorical feature value as missing.
func (*Batch) SetMissingNumerical ¶
func (batch *Batch) SetMissingNumerical(exampleIdx int, feature NumericalFeatureID)
SetMissingNumerical sets a numerical feature value as missing.
func (*Batch) SetNumerical ¶
func (batch *Batch) SetNumerical(exampleIdx int, feature NumericalFeatureID, value float32)
SetNumerical sets the value of a numerical feature.
func (*Batch) ToStringDebug ¶
ToStringDebug exports the content of the set of examples into a text-debug representation.
type CategoricalFeatureID ¶
type CategoricalFeatureID int
CategoricalFeatureID is the unique identifier of a categorical feature.
type CategoricalSpec ¶
type CategoricalSpec struct { // NumUniqueValues of this feature. The feature value should be in [0, NumUniqueValues). NumUniqueValues uint32 // contains filtered or unexported fields }
CategoricalSpec is the meta-data about a categorical feature.
type CompatibilityType ¶
type CompatibilityType int32
CompatibilityType indicates how the model was trained, and it affects how features are consumed.
const ( // CompatibilityYggdrasil is the native way to consume examples and models model with Yggdrasil // Decision Forests. CompatibilityYggdrasil CompatibilityType = 0 // CompatibilityTensorFlowDecisionForests consumes models trained with TensorFlow Decision // Forests. // // Compatibility impact: Categorical and categorical-set columns feed as integer are offset by // 1. See "CATEGORICAL_INTEGER_OFFSET" in TensorFlow Decision Forests. CompatibilityTensorFlowDecisionForests CompatibilityType = 1 // CompatibilityAutoTFX consumes models trained with TensorFlow Decision // Forests. // // Compatibility impact: Categorical and categorical-set columns feed as integer are offset by // 1. See "CATEGORICAL_INTEGER_OFFSET" in TensorFlow Decision Forests. Missing numerical and // categorical string values are replaced respectively by -1 and "" (empty string). CompatibilityAutoTFX CompatibilityType = 2 // CompatibilityAutomatic detects automatically the compatibility of the model. CompatibilityAutomatic = 3 )
type FeatureConstructionMap ¶
type FeatureConstructionMap struct { // Mapping between a column index (i.e. the index of the column in the // dataspec) and a NumericalFeatureID. NumericalFeatures map[int]NumericalFeatureID // Mapping between a column index (in the dataspec) and a // CategoricalFeatureID. CategoricalFeatures map[int]CategoricalFeatureID }
FeatureConstructionMap contains the mapping between the column index and the feature id. FeatureConstructionMap is only used during the model to engine compilation, and it is then discarded.
type Features ¶
type Features struct { // NumericalFeatures is the mapping between numerical feature names and numerical feature ids. // Indexed by "NumericalFeatureID". NumericalFeatures map[string]NumericalFeatureID // CategoricalFeatures is the mapping between categorical feature names and categorical feature // ids. Indexed by "CategoricalFeatureID". CategoricalFeatures map[string]CategoricalFeatureID // MissingNumericalValues is the representation of a "missing value" for each of the numerial // features. // Note: Currently, serving only support global imputation of missing values // during inference. MissingNumericalValues []float32 // MissingCategoricalValues is the representation of a "missing value" for each of the categorical // features.NumericalFeatureID MissingCategoricalValues []uint32 // CategoricalSpec is the meta-data about the categorical features. Indexed by // "CategoricalFeatureID". CategoricalSpec []CategoricalSpec // Compatibility indicates how the model is served. Compatibility CompatibilityType }
Features contains the definition of the input features of a model.
func (*Features) NumFeatures ¶
NumFeatures is the number of features.
func (*Features) OverrideMissingValuePlaceholders ¶
OverrideMissingValuePlaceholders specifies the values that will replace the missing numerical and categorical values when calling SetMissing* during inference.
Models are natively able to handle missing values. Overriding the missing values is a form of data pre-processing that should only be applied if such pre-processing is also applied during training.
type NumericalFeatureID ¶
type NumericalFeatureID int
NumericalFeatureID is the unique identifier of a numerical feature.