Documentation ¶
Overview ¶
Package dataset provides a basic dataset type which provides functionality for detecting inter-quartile range outliers.
Index ¶
Constants ¶
const Subsystem = "DSET"
Subsystem defines the logging code for this subsystem.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Dataset ¶
Dataset contains information about a set of float64 data points.
func (Dataset) GetOutliers ¶
func (d Dataset) GetOutliers(outlierMultiplier float64) ( map[string]*OutlierResult, error)
GetOutliers returns a map of the labels in the dataset to outlier results which indicate whether the associated value is an upper or lower inter- quartile outlier. If there are too few values to calculate inter-quartile outliers, it will return false values for all data points.
An outlier multiplier is provided to determine how strictly we classify outliers; lower values will identify more outliers, thus being more strict, and higher values will identify fewer outliers, thus being less strict. Multipliers less than 1.5 are considered to provide "weak outliers", because the values are still relatively close the the rest of the dataset. Multipliers more than 3 are considered to provide "strong outliers" because they identify values that are far from the rest of the dataset.
The effect of this value is illustrated in the example below: Given some random set of data, with lower quartile = 5 and upper quartile = 6, the inter-quartile range is 1.
LQ UQ [ 1 2 5 5 5 6 6 6 8 11 ]
For larger values, eg multiplier=3, we will detect fewer outliers: Lower outlier bound: 5 - (1 * 3) = 2
-> 1 is a strong lower outlier
Upper outlier bound: 6 + (1 * 3) = 9
-> 11 is a strong upper outlier
For smaller values, eg multiplier=1.5, we detect more outliers: Weak lower outlier bound: 5 - (1 * 1.5) = 3.5
-> 1 and 2 are weak lower outliers
Weak upper outlier bound: 6 + (1 *1.5) = 7.5
-> 8 and 11 are weak upper outliers
func (Dataset) GetThreshold ¶
GetThreshold returns the set of values in a dataset <= or > a given threshold. The below bool is used to toggle whether we identify values above or below the threshold.
type OutlierResult ¶
type OutlierResult struct { // UpperOutlier is true if the value is an upper outlier in the dataset. UpperOutlier bool // LowerOutlier is true if the value is a lower outlier in the dataset. LowerOutlier bool }
OutlierResult returns the results of an outlier check.