compare

package

v0.0.0-...-8b74f12 Latest Latest Go to latest Published: Feb 21, 2025 License: BSD-3-Clause Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

skia.googlesource.com/buildbot

Documentation ¶

Index ¶

Constants
func KolmogorovSmirnov(a []float64, b []float64) (float64, error)
func MannWhitneyU(a []float64, b []float64) float64
type ComparePairwiseResult
- func ComparePairwise(valuesA, valuesB []float64, dir ImprovementDir) (*ComparePairwiseResult, error)
type CompareResults
- func CompareFunctional(valuesA, valuesB []float64, expectedErrRate float64) (*CompareResults, error)
- func ComparePerformance(valuesA, valuesB []float64, rawMagnitude float64, direction ImprovementDir) (*CompareResults, error)
type ImprovementDir
type Verdict

Constants ¶

View Source

const DefaultFunctionalErrRate = 1.0

Based on https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/dashboard/dashboard/pinpoint/models/job_state.py;drc=94f2bff5159bf660910b35c39426102c5982c4a4;l=356 the default functional analysis error rate expected is 1.0 for all bisections pivoting to functional analysis.

Variables ¶

This section is empty.

Functions ¶

func KolmogorovSmirnov ¶

func KolmogorovSmirnov(a []float64, b []float64) (float64, error)

KolmogorovSmirnov computes the 2-sample Kolmogorov-Smirnov test on samples x and y.

func MannWhitneyU ¶

func MannWhitneyU(a []float64, b []float64) float64

MannWhitneyU computes the Mann-Whitney rank test on samples x and y.

Types ¶

type ComparePairwiseResult ¶

type ComparePairwiseResult struct {
	// Verdict is the outcome of the statistical analysis which is Same or Different.
	// Note that pairwise does not have an Unknown verdict.
	Verdict Verdict
	// stats.PairwiseWilcoxonSignedRankedTestResult is the result of the Pairwise
	// statistical analysis.
	stats.PairwiseWilcoxonSignedRankedTestResult
}

ComparePairwiseResult contains the results of a pairwise comparison between two samples

func ComparePairwise ¶

func ComparePairwise(valuesA, valuesB []float64, dir ImprovementDir) (*ComparePairwiseResult, error)

ComparePairwise wraps PairwiseWilcoxonSignedRankedTest.

type CompareResults ¶

type CompareResults struct {
	// Verdict is the outcome of the statistical analysis which is either
	// Unknown, Same, or Different.
	Verdict Verdict
	// PValue is the consolidated p-value for the statistical tests used.
	PValue float64
	// PValueKS is the p-value estimate from the KS test
	PValueKS float64
	// PValueMWU is the p-value estimate from the MWU test
	PValueMWU float64
	// LowThreshold is `alpha` where if the p-value is lower means we can
	// 										reject the null hypothesis.
	LowThreshold float64
	// 	HighThreshold is the `alpha` where if the p-value is lower means we need
	// 											more information to make a definitive judgement.
	HighThreshold float64
	// MeanDiff is the difference between the mean of B and the mean of A.
	// MeanDiff > 0 means the mean of B > mean of A.
	// MeanDiff is used to decide if a difference is a regression or not.
	MeanDiff float64
	// IsTooSmall indicates that the regression is too small and a
	// comparison did not take place
	IsTooSmall bool
}

CompareResults contains the results of a comparison between two samples. TODO(b/299537769): update verdict to use protos

func CompareFunctional ¶

func CompareFunctional(valuesA, valuesB []float64, expectedErrRate float64) (*CompareResults, error)

CompareFunctional determines if valuesA and valuesB are statistically different, statistically same or unknown from each other using the functional low and high thresholds. Functional analysis compares failure rates between A and B. The expectedErrRate expresses how much the culprit CL is responsible for flakiness in a benchmark measurement. i.e. expectedErrRate = 0.5 means the culprit is causing the benchmark to fail 50% of the time more often.

func ComparePerformance ¶

func ComparePerformance(valuesA, valuesB []float64, rawMagnitude float64, direction ImprovementDir) (*CompareResults, error)

ComparePerformance determines if valuesA and valuesB are statistically different, statistically same or unknown from each other based on the perceived rawMagnitude difference between valuesA and valuesB using the performance low and high thresholds.

type ImprovementDir ¶

type ImprovementDir string

ImprovementDir is the improvement direction of the measurement being measured. The directions are either up, down, or unknown.

const (
	// UnknownDir means the job request did not send an improvement
	// direction. Rather than infer it, we assume the direction
	// is unknown and drill deeper on all statistically significant
	// changes.
	UnknownDir ImprovementDir = "UnknownDir"
	// Up means the improvement direction is increasing.
	Up ImprovementDir = "Up"
	// Down means the improvement direction is decreasing.
	Down ImprovementDir = "Down"
)

These ImprovementDirs are the possible improvement directions.

type Verdict ¶

type Verdict string

define verdict enums

const (
	// Unknown means that there is not enough evidence to reject
	// either hypothesis. Collect more data before making a final decision.
	Unknown Verdict = "Unknown"
	// Same means that the sample likely come from the same distribution.
	// Cannot reject the null hypothesis.
	Same Verdict = "Same"
	// Different means that the samples are unlikely to come
	// from the same distribution. Reject the null hypothesis.
	Different Verdict = "Different"
	// NilVerdict means there was no analysis to be done.
	// This can happen in performance comparisons when all
	// benchmark runs fail and there is no data to analyze.
	NilVerdict Verdict = "Nil"
	// ErrorVerdict means something went wrong with the analysis.
	// Returning this verdict is better than returning a nil struct.
	ErrorVerdict Verdict = "Error"
)

These verdicts are the possible results of the statistical analysis.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
stats
thresholds

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL