model

package
v0.0.0-...-eee4e15 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 6, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package model contains common types used in changepoint analysis.

Index

Constants

View Source
const TailLikelihoodsLength = 17

TailLikelihoodsLength is the number of points the changepoint position probability distribution is reduced to by PositionDistribution.

Variables

View Source
var TailLikelihoods = [TailLikelihoodsLength]float64{
	0.0005,
	0.005,
	0.025,
	0.05,
	0.10,
	0.15,
	0.20,
	0.25,
	0.5,
	0.75,
	0.80,
	0.85,
	0.90,
	0.95,
	0.975,
	0.995,
	0.9995,
}

TailLikelihoods defines the points on the changepoint position cumulative distribution function that PositionDistribution will store.

I.E. the values of P(C <= k | Y) for which to record k, where:

  • C is the random variable representing the (unknown) true changepoint position
  • k is the upper bound on the changepoint position.
  • Y is the test history

We do not wish to store the exact changepoint position distribution in its full fidelity as that requires storing all test results and that is expensive memory-wise. We therefore instead pick a set of points that we may be interested to use later and store only those points.

The current design allows for common one and two-tailed intervals (e.g. 99.9%, 99%, 95%, 90%, etc.) to be calculated, and for the entire CDF to be re-constructed with absolute error of not more than 2.5%, to allow for probability density heatmaps.

IMPORTANT: Do not change these points unless you handle data compatibility for old distributions that are already stored. MUST be in ascending order. MUST be symmetric around the middle, i.e. TailLikelihoods[i] = 1 - TailLikelihoods[(TailLikelihoodsLength-1) - i].

Functions

This section is empty.

Types

type PositionDistribution

type PositionDistribution [TailLikelihoodsLength]int64

PositionDistribution represents the distribution of possible change point start (commit) positions. It is a quantization (i.e. sampling, compression of) the true distribution.

To be precise, for a range of left-tail probilities x_0, x_1, ... (defined in TailLikelihoods above) it stores the upper bound source position k on the (unknown) true changepoint source position C such that P(C <= k) is approximately equal to x.

X < 0.5 represents the left half of the distribution, and X > 0.5 represents the right half. Note that the value of k that gives P(C <= k) closest to 0.5 is not necessarily the same as the nominal start position of the change point, in much the same way that the median of a distribution is not the same as its mode.

func PositionDistributionFromProto

func PositionDistributionFromProto(v []int64) *PositionDistribution

PositionDistributionFromProto creates a PositionDistribution from its proto representation.

func SimpleDistribution

func SimpleDistribution(center int64, width int64) *PositionDistribution

SimpleDistribution returns a distribution with the given center commit position and has 99% of data fall within the certain width from the center. Provided to simplify writing test cases that require a position distribution.

func (PositionDistribution) ConfidenceInterval

func (d PositionDistribution) ConfidenceInterval(y float64) (min int64, max int64)

ConfidenceInterval returns the (y*100)% two-tailed confidence interval for the change point start position.

E.g. Y = 0.99 gives the 99% confidence interval (with left and right tails having probability ~ 0.005 each).

The interpretation of (min, max) is as follows: There is at least a Y probability that the change which represents the start of the 'new' behaviour is between commit positions min and max inclusive.

y shall be between 0.0 and 0.999.

func (*PositionDistribution) Serialize

func (d *PositionDistribution) Serialize() []int64

Serialize serializes the position distribution to its proto representation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL