outliers

command
v0.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2016 License: MIT Imports: 5 Imported by: 0

README

Outlier UDF Examples

Each example in this directory should implement a simple outlier detection algorithm.

The algorithm:

Find outliers via the Tukey method. Return all outliers that are more outside lower and upper bounds.

The bounds are defined by:

Lower Bound = 1st Quartile - IQRSCALE Upper Bound = 3rd Quartile + IQRSCALE

where

IQR = 3rd Quartile - 1st Quartile SCALE = a user defined value >= 1.0

To implement this method one must first find the 1st and 3rd quartiles and then compute the lower and upper bounds.

The quartiles are to be calculated via the median method. First compute the median of the entire data set. Then compute the median of each half of the data set with neither half containing the median. The medians of each half are the first and third quartiles. To compute the median with even number of points return the arithmetic mean of the two middle points.

The UDF should implement the above algorithm and follow these specifications.

  • Wants and provides a batch edge.
  • For each batch received return a batch with only the outlier points included unmodified.
  • Has a field option that specifies which field to operate on.
  • Has a scale option that specifies a scale multiplier to the IQR value. Default is 1.5.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL