clustering

package
v0.0.0-...-141b21d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2024 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Index

Constants

View Source
const AlgorithmRePattern = `[0-9a-z\-]{1,26}-v[1-9][0-9]{0,3}`

AlgorithmRePattern is the regular expression pattern matching validly formed clustering algorithm names. The overarching requirement is [0-9a-z\-]{1,32}, which we sudivide into an algorithm name of up to 26 characters and an algorithm version number.

View Source
const FailureReasonAlgorithmPrefix = "reason-"

FailureReasonAlgorithmPrefix is the algorithm name prefix used by all versions of the failure reason clustering algorithm.

View Source
const MaxClusterIDBytes = 16

MaxClusterIDBytes is the maximum number of bytes the algorithm-determined cluster ID may occupy. This is the raw number of bytes; if the ID is hex- encoded (e.g. for use in a BigQuery table), its length in characters may be double this number.

View Source
const RulesAlgorithmPrefix = "rules-"

RulesAlgorithmPrefix is the algorithm name prefix used by all versions of the rules-based clustering algorithm.

View Source
const TestNameAlgorithmPrefix = "testname-"

TestNameAlgorithmPrefix is the algorithm name prefix used by all versions of the test name clustering algorithm.

Variables

View Source
var AlgorithmRe = regexp.MustCompile(`^` + AlgorithmRePattern + `$`)

AlgorithmRe matches validly formed clustering algorithm names.

View Source
var ChunkRe = regexp.MustCompile(`^[0-9a-f]{1,32}$`)

ChunkRe matches validly formed chunk IDs.

Functions

func AlgorithmsAndClustersEqual

func AlgorithmsAndClustersEqual(a *ClusterResults, b *ClusterResults) bool

AlgorithmsAndClustersEqual returns whether the algorithms and clusters of two cluster results are equivalent.

func ClustersAreSortedNoDuplicates

func ClustersAreSortedNoDuplicates(cs []ClusterID) bool

ClustersAreSortedNoDuplicates verifies that clusters are in sorted order and there are no duplicate clusters.

func ClustersEqual

func ClustersEqual(as []ClusterID, bs []ClusterID) bool

ClustersEqual returns whether the clusters in `as` are element-wise equal to those in `bs`. To test set-wise cluster equality, this method is called with clusters in sorted order, and no duplicates.

func EscapeToGraphical

func EscapeToGraphical(value string) string

EscapeToGraphical escapes the input so that it only contains graphic unicode characters. Use on test names and failure reasons before presenting to any UI context.

func QuoteForRule

func QuoteForRule(value string) string

QuoteForRule escapes the input to a double-quoted string literal suitable for use in a failure association rule.

func SortClusters

func SortClusters(cs []ClusterID)

SortClusters sorts the given clusters in ascending algorithm and then ID order.

Types

type ClusterDescription

type ClusterDescription struct {
	// Title is a short, one-line description of the cluster, for use
	// in the bug title.
	Title string
	// Description is a human-readable description of the cluster.
	Description string
}

ClusterDescription captures the description of a cluster, for use in bug filing.

type ClusterID

type ClusterID struct {
	// Algorithm is the name of the clustering algorithm that identified
	// the cluster.
	Algorithm string `json:"algorithm"`
	// ID is the cluster identifier returned by the algorithm. The underlying
	// identifier is at most 16 bytes, but is represented here as a hexadecimal
	// string of up to 32 lowercase hexadecimal characters.
	ID string `json:"id"`
}

ClusterID represents the identity of a cluster. The LUCI Project is omitted as it is assumed to be implicit from the context.

func (ClusterID) IsBugCluster

func (c ClusterID) IsBugCluster() bool

IsBugCluster returns whether this cluster is backed by a failure association rule, and produced by a version of the failure association rule based clustering algorithm.

func (ClusterID) IsEmpty

func (c ClusterID) IsEmpty() bool

IsEmpty returns whether the cluster ID is equal to its zero value.

func (ClusterID) IsFailureReasonCluster

func (c ClusterID) IsFailureReasonCluster() bool

IsFailureReasonCluster returns whether this cluster was made by a version of the failure reason clustering algorithm.

func (ClusterID) IsTestNameCluster

func (c ClusterID) IsTestNameCluster() bool

IsTestNameCluster returns whether this cluster was made by a version of the test name clustering algorithm.

func (ClusterID) Key

func (c ClusterID) Key() string

Key returns a value that can be used to uniquely identify the Cluster. This is designed for cases where it is desirable for cluster IDs to be used as keys in a map.

func (ClusterID) String

func (c ClusterID) String() string

String returns a string-representation of the cluster, for debugging.

func (ClusterID) Validate

func (c ClusterID) Validate() error

Validate validates the algorithm and ID parts of the cluster ID are valid.

func (ClusterID) ValidateIDPart

func (c ClusterID) ValidateIDPart() error

ValidateIDPart validates that the ID part of the cluster ID is valid.

type ClusterResults

type ClusterResults struct {
	// AlgorithmsVersion is the version of clustering algorithms used to
	// cluster test results in this chunk. (This is a version over the
	// set of algorithms, distinct from the version of a single algorithm,
	// e.g.: v1 -> {reason-v1}, v2 -> {reason-v1, testname-v1},
	// v3 -> {reason-v2, testname-v1}.)
	AlgorithmsVersion int64
	// ConfigVersion is the version of LUCI Analysis project configuration
	// used to cluster the test results. Clustering algorithms can rely
	// on the configuration to alter their behaviour, so changes to
	// the configuration should trigger re-clustering of test results.
	ConfigVersion time.Time
	// RulesVersion is the version of failure association rules used
	// to cluster test results.  This is most recent PredicateLastUpdated
	// time in the snapshot of failure association rules used to cluster
	// the test results.
	RulesVersion time.Time
	// Algorithms is the set of algorithms that were used to cluster
	// the test results. Each entry is an algorithm name.
	// When stored alongside the clustered test results, this allows only
	// the new algorithms to be run when re-clustering (for efficiency).
	Algorithms map[string]struct{}
	// Clusters records the clusters each test result is in;
	// one slice of ClusterIDs for each test result. For each test result,
	// clusters must be in sorted order, with no duplicates.
	Clusters [][]ClusterID
}

ClusterResults represents the results of clustering a list of test failures.

type ClusterSummary

type ClusterSummary struct {
	// Example is an example failure contained within the cluster.
	Example Failure

	// TopTests is a list of up to 5 most commonly occurring tests
	// included in the cluster.
	TopTests []string
}

ClusterSummary captures information about a cluster. This is a subset of the information captured by LUCI Analysis for failures.

type Failure

type Failure struct {
	// The name of the test that failed.
	TestID string
	// The failure reason explaining the reason why the test failed.
	Reason *pb.FailureReason
}

Failure captures the minimal information required to cluster a failure. This is a subset of the information captured by LUCI Analysis for failures.

func FailureFromProto

func FailureFromProto(f *cpb.Failure) *Failure

FailureFromProto extracts failure information relevant for clustering from a LUCI Analysis failure proto.

func FailuresFromProtos

func FailuresFromProtos(protos []*cpb.Failure) []*Failure

FailuresFromProtos extracts failure information relevant for clustering from a set of LUCI Analysis failure protos.

type FailureUpdate

type FailureUpdate struct {
	// TestResult is the failure that was re-clustered.
	TestResult *cpb.Failure
	// PreviousClusters are the clusters the failure was previously in.
	PreviousClusters []ClusterID
	// PreviousClusters are the clusters the failure is now in.
	NewClusters []ClusterID
}

FailureUpdate describes the changes made to the clustering of a specific test failure.

type Update

type Update struct {
	// Project is the LUCI Project containing the chunk which is being
	// (re-)clustered.
	Project string
	// ChunkID is the identity of the chunk which is being (re-)clustered.
	ChunkID string
	// Updates describes how each failure in the cluster was (re)clustered.
	// It contains one entry for each failure in the cluster that has
	// had its clusters changed.
	Updates []*FailureUpdate
}

Update describes changes made to the clustering of a chunk.

Directories

Path Synopsis
failurereason
Package failurereason contains the failure reason clustering algorithm for LUCI Analysis.
Package failurereason contains the failure reason clustering algorithm for LUCI Analysis.
testname
Package testname contains the test name-based clustering algorithm for LUCI Analysis.
Package testname contains the test name-based clustering algorithm for LUCI Analysis.
testname/rules
Package rules provides methods to evaluate test name clustering rules.
Package rules provides methods to evaluate test name clustering rules.
Package rules contains methods to read and write failure association rules.
Package rules contains methods to read and write failure association rules.
exporter
Package exporter provides methods to interact with the failure_assocation_rules BigQuery table.
Package exporter provides methods to interact with the failure_assocation_rules BigQuery table.
lang
Package lang parses failure association rule predicates.
Package lang parses failure association rule predicates.
Package shards provides methods to access the ReclusteringShards Spanner table.
Package shards provides methods to access the ReclusteringShards Spanner table.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL