compare

package
v0.0.0-...-8942fcd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2025 License: BSD-3-Clause Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultFunctionalErrRate = 1.0

Based on https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/dashboard/dashboard/pinpoint/models/job_state.py;drc=94f2bff5159bf660910b35c39426102c5982c4a4;l=356 the default functional analysis error rate expected is 1.0 for all bisections pivoting to functional analysis.

Variables

This section is empty.

Functions

func KolmogorovSmirnov

func KolmogorovSmirnov(a []float64, b []float64) (float64, error)

KolmogorovSmirnov computes the 2-sample Kolmogorov-Smirnov test on samples x and y.

func MannWhitneyU

func MannWhitneyU(a []float64, b []float64) float64

MannWhitneyU computes the Mann-Whitney rank test on samples x and y.

Types

type ComparePairwiseResult

type ComparePairwiseResult struct {
	// Verdict is the outcome of the statistical analysis which is Same or Different.
	// Note that pairwise does not have an Unknown verdict.
	Verdict Verdict
	// stats.PairwiseWilcoxonSignedRankedTestResult is the result of the Pairwise
	// statistical analysis.
	stats.PairwiseWilcoxonSignedRankedTestResult
}

ComparePairwiseResult contains the results of a pairwise comparison between two samples

func ComparePairwise

func ComparePairwise(valuesA, valuesB []float64, dir ImprovementDir) (*ComparePairwiseResult, error)

ComparePairwise wraps PairwiseWilcoxonSignedRankedTest.

type CompareResults

type CompareResults struct {
	// Verdict is the outcome of the statistical analysis which is either
	// Unknown, Same, or Different.
	Verdict Verdict
	// PValue is the consolidated p-value for the statistical tests used.
	PValue float64
	// PValueKS is the p-value estimate from the KS test
	PValueKS float64
	// PValueMWU is the p-value estimate from the MWU test
	PValueMWU float64
	// LowThreshold is `alpha` where if the p-value is lower means we can
	// 										reject the null hypothesis.
	LowThreshold float64
	// 	HighThreshold is the `alpha` where if the p-value is lower means we need
	// 											more information to make a definitive judgement.
	HighThreshold float64
	// MeanDiff is the difference between the mean of B and the mean of A.
	// MeanDiff > 0 means the mean of B > mean of A.
	// MeanDiff is used to decide if a difference is a regression or not.
	MeanDiff float64
	// IsTooSmall indicates that the regression is too small and a
	// comparison did not take place
	IsTooSmall bool
}

CompareResults contains the results of a comparison between two samples. TODO(b/299537769): update verdict to use protos

func CompareFunctional

func CompareFunctional(valuesA, valuesB []float64, expectedErrRate float64) (*CompareResults, error)

CompareFunctional determines if valuesA and valuesB are statistically different, statistically same or unknown from each other using the functional low and high thresholds. Functional analysis compares failure rates between A and B. The expectedErrRate expresses how much the culprit CL is responsible for flakiness in a benchmark measurement. i.e. expectedErrRate = 0.5 means the culprit is causing the benchmark to fail 50% of the time more often.

func ComparePerformance

func ComparePerformance(valuesA, valuesB []float64, rawMagnitude float64, direction ImprovementDir) (*CompareResults, error)

ComparePerformance determines if valuesA and valuesB are statistically different, statistically same or unknown from each other based on the perceived rawMagnitude difference between valuesA and valuesB using the performance low and high thresholds.

type ImprovementDir

type ImprovementDir string

ImprovementDir is the improvement direction of the measurement being measured. The directions are either up, down, or unknown.

const (
	// UnknownDir means the job request did not send an improvement
	// direction. Rather than infer it, we assume the direction
	// is unknown and drill deeper on all statistically significant
	// changes.
	UnknownDir ImprovementDir = "UnknownDir"
	// Up means the improvement direction is increasing.
	Up ImprovementDir = "Up"
	// Down means the improvement direction is decreasing.
	Down ImprovementDir = "Down"
)

These ImprovementDirs are the possible improvement directions.

type Verdict

type Verdict string

define verdict enums

const (
	// Unknown means that there is not enough evidence to reject
	// either hypothesis. Collect more data before making a final decision.
	Unknown Verdict = "Unknown"
	// Same means that the sample likely come from the same distribution.
	// Cannot reject the null hypothesis.
	Same Verdict = "Same"
	// Different means that the samples are unlikely to come
	// from the same distribution. Reject the null hypothesis.
	Different Verdict = "Different"
	// NilVerdict means there was no analysis to be done.
	// This can happen in performance comparisons when all
	// benchmark runs fail and there is no data to analyze.
	NilVerdict Verdict = "Nil"
	// ErrorVerdict means something went wrong with the analysis.
	// Returning this verdict is better than returning a nil struct.
	ErrorVerdict Verdict = "Error"
)

These verdicts are the possible results of the statistical analysis.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL