Documentation ¶
Overview ¶
Package model contains common types used in changepoint analysis.
Index ¶
Constants ¶
const TailLikelihoodsLength = 17
TailLikelihoodsLength is the number of points the changepoint position probability distribution is reduced to by PositionDistribution.
Variables ¶
var TailLikelihoods = [TailLikelihoodsLength]float64{
0.0005,
0.005,
0.025,
0.05,
0.10,
0.15,
0.20,
0.25,
0.5,
0.75,
0.80,
0.85,
0.90,
0.95,
0.975,
0.995,
0.9995,
}
TailLikelihoods defines the points on the changepoint position cumulative distribution function that PositionDistribution will store.
I.E. the values of P(C <= k | Y) for which to record k, where:
- C is the random variable representing the (unknown) true changepoint position
- k is the upper bound on the changepoint position.
- Y is the test history
We do not wish to store the exact changepoint position distribution in its full fidelity as that requires storing all test results and that is expensive memory-wise. We therefore instead pick a set of points that we may be interested to use later and store only those points.
The current design allows for common one and two-tailed intervals (e.g. 99.9%, 99%, 95%, 90%, etc.) to be calculated, and for the entire CDF to be re-constructed with absolute error of not more than 2.5%, to allow for probability density heatmaps.
IMPORTANT: Do not change these points unless you handle data compatibility for old distributions that are already stored. MUST be in ascending order. MUST be symmetric around the middle, i.e. TailLikelihoods[i] = 1 - TailLikelihoods[(TailLikelihoodsLength-1) - i].
Functions ¶
This section is empty.
Types ¶
type PositionDistribution ¶
type PositionDistribution [TailLikelihoodsLength]int64
PositionDistribution represents the distribution of possible change point start (commit) positions. It is a quantization (i.e. sampling, compression of) the true distribution.
To be precise, for a range of left-tail probilities x_0, x_1, ... (defined in TailLikelihoods above) it stores the upper bound source position k on the (unknown) true changepoint source position C such that P(C <= k) is approximately equal to x.
X < 0.5 represents the left half of the distribution, and X > 0.5 represents the right half. Note that the value of k that gives P(C <= k) closest to 0.5 is not necessarily the same as the nominal start position of the change point, in much the same way that the median of a distribution is not the same as its mode.
func PositionDistributionFromProto ¶
func PositionDistributionFromProto(v []int64) *PositionDistribution
PositionDistributionFromProto creates a PositionDistribution from its proto representation.
func SimpleDistribution ¶
func SimpleDistribution(center int64, width int64) *PositionDistribution
SimpleDistribution returns a distribution with the given center commit position and has 99% of data fall within the certain width from the center. Provided to simplify writing test cases that require a position distribution.
func (PositionDistribution) ConfidenceInterval ¶
func (d PositionDistribution) ConfidenceInterval(y float64) (min int64, max int64)
ConfidenceInterval returns the (y*100)% two-tailed confidence interval for the change point start position.
E.g. Y = 0.99 gives the 99% confidence interval (with left and right tails having probability ~ 0.005 each).
The interpretation of (min, max) is as follows: There is at least a Y probability that the change which represents the start of the 'new' behaviour is between commit positions min and max inclusive.
y shall be between 0.0 and 0.999.
func (*PositionDistribution) Serialize ¶
func (d *PositionDistribution) Serialize() []int64
Serialize serializes the position distribution to its proto representation.