dataset

package
v0.2.13-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 21, 2024 License: MIT Imports: 4 Imported by: 1

Documentation

Overview

Package dataset provides a basic dataset type which provides functionality for detecting inter-quartile range outliers.

Index

Constants

View Source
const Subsystem = "DSET"

Subsystem defines the logging code for this subsystem.

Variables

This section is empty.

Functions

func UseLogger

func UseLogger(logger btclog.Logger)

UseLogger uses a specified Logger to output package logging info. This should be used in preference to SetLogWriter if the caller is also using btclog.

Types

type Dataset

type Dataset map[string]float64

Dataset contains information about a set of float64 data points.

func New

func New(valueMap map[string]float64) Dataset

New returns takes a map of labels to values and returns it as a dataset.

func (Dataset) GetOutliers

func (d Dataset) GetOutliers(outlierMultiplier float64) (
	map[string]*OutlierResult, error)

GetOutliers returns a map of the labels in the dataset to outlier results which indicate whether the associated value is an upper or lower inter- quartile outlier. If there are too few values to calculate inter-quartile outliers, it will return false values for all data points.

An outlier multiplier is provided to determine how strictly we classify outliers; lower values will identify more outliers, thus being more strict, and higher values will identify fewer outliers, thus being less strict. Multipliers less than 1.5 are considered to provide "weak outliers", because the values are still relatively close the the rest of the dataset. Multipliers more than 3 are considered to provide "strong outliers" because they identify values that are far from the rest of the dataset.

The effect of this value is illustrated in the example below: Given some random set of data, with lower quartile = 5 and upper quartile = 6, the inter-quartile range is 1.

        LQ             UQ
[ 1  2  5  5  5  6  6  6  8 11 ]

For larger values, eg multiplier=3, we will detect fewer outliers: Lower outlier bound: 5 - (1 * 3) = 2

-> 1 is a strong lower outlier

Upper outlier bound: 6 + (1 * 3) = 9

-> 11 is a strong upper outlier

For smaller values, eg multiplier=1.5, we detect more outliers: Weak lower outlier bound: 5 - (1 * 1.5) = 3.5

-> 1 and 2 are weak lower outliers

Weak upper outlier bound: 6 + (1 *1.5) = 7.5

-> 8 and 11 are weak upper outliers

func (Dataset) GetThreshold

func (d Dataset) GetThreshold(thresholdValue float64, below bool) map[string]bool

GetThreshold returns the set of values in a dataset <= or > a given threshold. The below bool is used to toggle whether we identify values above or below the threshold.

func (Dataset) Value

func (d Dataset) Value(label string) float64

Value returns the value that a label is associated with in a set.

type OutlierResult

type OutlierResult struct {
	// UpperOutlier is true if the value is an upper outlier in the dataset.
	UpperOutlier bool

	// LowerOutlier is true if the value is a lower outlier in the dataset.
	LowerOutlier bool
}

OutlierResult returns the results of an outlier check.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL