clustering

package

v0.0.0-...-4d0e586 Latest Latest Go to latest Published: Dec 18, 2024 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

chromium.googlesource.com/infra/luci/luci-go

Documentation ¶

Index ¶

Constants
Variables
func AlgorithmsAndClustersEqual(a *ClusterResults, b *ClusterResults) bool
func ClustersAreSortedNoDuplicates(cs []ClusterID) bool
func ClustersEqual(as []ClusterID, bs []ClusterID) bool
func EscapeToGraphical(value string) string
func QuoteForRule(value string) string
func SortClusters(cs []ClusterID)
type ClusterDescription
type ClusterID
type ClusterResults
type ClusterSummary
type Failure
- func FailureFromProto(f *cpb.Failure) *Failure
- func FailuresFromProtos(protos []*cpb.Failure) []*Failure
type FailureUpdate
type Update

Constants ¶

View Source

const AlgorithmRePattern = `[0-9a-z\-]{1,26}-v[1-9][0-9]{0,3}`

AlgorithmRePattern is the regular expression pattern matching validly formed clustering algorithm names. The overarching requirement is [0-9a-z\-]{1,32}, which we sudivide into an algorithm name of up to 26 characters and an algorithm version number.

View Source

const FailureReasonAlgorithmPrefix = "reason-"

FailureReasonAlgorithmPrefix is the algorithm name prefix used by all versions of the failure reason clustering algorithm.

View Source

const MaxClusterIDBytes = 16

MaxClusterIDBytes is the maximum number of bytes the algorithm-determined cluster ID may occupy. This is the raw number of bytes; if the ID is hex- encoded (e.g. for use in a BigQuery table), its length in characters may be double this number.

View Source

const RulesAlgorithmPrefix = "rules-"

RulesAlgorithmPrefix is the algorithm name prefix used by all versions of the rules-based clustering algorithm.

View Source

const TestNameAlgorithmPrefix = "testname-"

TestNameAlgorithmPrefix is the algorithm name prefix used by all versions of the test name clustering algorithm.

Variables ¶

View Source

var AlgorithmRe = regexp.MustCompile(`^` + AlgorithmRePattern + `$`)

AlgorithmRe matches validly formed clustering algorithm names.

View Source

var ChunkRe = regexp.MustCompile(`^[0-9a-f]{1,32}$`)

ChunkRe matches validly formed chunk IDs.

Functions ¶

func AlgorithmsAndClustersEqual ¶

func AlgorithmsAndClustersEqual(a *ClusterResults, b *ClusterResults) bool

AlgorithmsAndClustersEqual returns whether the algorithms and clusters of two cluster results are equivalent.

func ClustersAreSortedNoDuplicates ¶

func ClustersAreSortedNoDuplicates(cs []ClusterID) bool

ClustersAreSortedNoDuplicates verifies that clusters are in sorted order and there are no duplicate clusters.

func ClustersEqual ¶

func ClustersEqual(as []ClusterID, bs []ClusterID) bool

ClustersEqual returns whether the clusters in `as` are element-wise equal to those in `bs`. To test set-wise cluster equality, this method is called with clusters in sorted order, and no duplicates.

func EscapeToGraphical ¶

func EscapeToGraphical(value string) string

EscapeToGraphical escapes the input so that it only contains graphic unicode characters. Use on test names and failure reasons before presenting to any UI context.

func QuoteForRule ¶

func QuoteForRule(value string) string

QuoteForRule escapes the input to a double-quoted string literal suitable for use in a failure association rule.

func SortClusters ¶

func SortClusters(cs []ClusterID)

SortClusters sorts the given clusters in ascending algorithm and then ID order.

Types ¶

type ClusterDescription ¶

type ClusterDescription struct {
	// Title is a short, one-line description of the cluster, for use
	// in the bug title.
	Title string
	// Description is a human-readable description of the cluster.
	Description string
}

ClusterDescription captures the description of a cluster, for use in bug filing.

type ClusterID ¶

type ClusterID struct {
	// Algorithm is the name of the clustering algorithm that identified
	// the cluster.
	Algorithm string `json:"algorithm"`
	// ID is the cluster identifier returned by the algorithm. The underlying
	// identifier is at most 16 bytes, but is represented here as a hexadecimal
	// string of up to 32 lowercase hexadecimal characters.
	ID string `json:"id"`
}

ClusterID represents the identity of a cluster. The LUCI Project is omitted as it is assumed to be implicit from the context.

func (ClusterID) IsBugCluster ¶

func (c ClusterID) IsBugCluster() bool

IsBugCluster returns whether this cluster is backed by a failure association rule, and produced by a version of the failure association rule based clustering algorithm.

func (ClusterID) IsEmpty ¶

func (c ClusterID) IsEmpty() bool

IsEmpty returns whether the cluster ID is equal to its zero value.

func (ClusterID) IsFailureReasonCluster ¶

func (c ClusterID) IsFailureReasonCluster() bool

IsFailureReasonCluster returns whether this cluster was made by a version of the failure reason clustering algorithm.

func (ClusterID) IsTestNameCluster ¶

func (c ClusterID) IsTestNameCluster() bool

IsTestNameCluster returns whether this cluster was made by a version of the test name clustering algorithm.

func (ClusterID) Key ¶

func (c ClusterID) Key() string

Key returns a value that can be used to uniquely identify the Cluster. This is designed for cases where it is desirable for cluster IDs to be used as keys in a map.

func (ClusterID) String ¶

func (c ClusterID) String() string

String returns a string-representation of the cluster, for debugging.

func (ClusterID) Validate ¶

func (c ClusterID) Validate() error

Validate validates the algorithm and ID parts of the cluster ID are valid.

func (ClusterID) ValidateIDPart ¶

func (c ClusterID) ValidateIDPart() error

ValidateIDPart validates that the ID part of the cluster ID is valid.

type ClusterResults ¶

type ClusterResults struct {
	// AlgorithmsVersion is the version of clustering algorithms used to
	// cluster test results in this chunk. (This is a version over the
	// set of algorithms, distinct from the version of a single algorithm,
	// e.g.: v1 -> {reason-v1}, v2 -> {reason-v1, testname-v1},
	// v3 -> {reason-v2, testname-v1}.)
	AlgorithmsVersion int64
	// ConfigVersion is the version of LUCI Analysis project configuration
	// used to cluster the test results. Clustering algorithms can rely
	// on the configuration to alter their behaviour, so changes to
	// the configuration should trigger re-clustering of test results.
	ConfigVersion time.Time
	// RulesVersion is the version of failure association rules used
	// to cluster test results.  This is most recent PredicateLastUpdated
	// time in the snapshot of failure association rules used to cluster
	// the test results.
	RulesVersion time.Time
	// Algorithms is the set of algorithms that were used to cluster
	// the test results. Each entry is an algorithm name.
	// When stored alongside the clustered test results, this allows only
	// the new algorithms to be run when re-clustering (for efficiency).
	Algorithms map[string]struct{}
	// Clusters records the clusters each test result is in;
	// one slice of ClusterIDs for each test result. For each test result,
	// clusters must be in sorted order, with no duplicates.
	Clusters [][]ClusterID
}

ClusterResults represents the results of clustering a list of test failures.

type ClusterSummary ¶

type ClusterSummary struct {
	// Example is an example failure contained within the cluster.
	Example Failure

	// TopTests is a list of up to 5 most commonly occurring tests
	// included in the cluster.
	TopTests []string
}

ClusterSummary captures information about a cluster. This is a subset of the information captured by LUCI Analysis for failures.

type Failure ¶

type Failure struct {
	// The name of the test that failed.
	TestID string
	// The failure reason explaining the reason why the test failed.
	Reason *pb.FailureReason
}

Failure captures the minimal information required to cluster a failure. This is a subset of the information captured by LUCI Analysis for failures.

func FailureFromProto ¶

func FailureFromProto(f *cpb.Failure) *Failure

FailureFromProto extracts failure information relevant for clustering from a LUCI Analysis failure proto.

func FailuresFromProtos ¶

func FailuresFromProtos(protos []*cpb.Failure) []*Failure

FailuresFromProtos extracts failure information relevant for clustering from a set of LUCI Analysis failure protos.

type FailureUpdate ¶

type FailureUpdate struct {
	// TestResult is the failure that was re-clustered.
	TestResult *cpb.Failure
	// PreviousClusters are the clusters the failure was previously in.
	PreviousClusters []ClusterID
	// PreviousClusters are the clusters the failure is now in.
	NewClusters []ClusterID
}

FailureUpdate describes the changes made to the clustering of a specific test failure.

type Update ¶

type Update struct {
	// Project is the LUCI Project containing the chunk which is being
	// (re-)clustered.
	Project string
	// ChunkID is the identity of the chunk which is being (re-)clustered.
	ChunkID string
	// Updates describes how each failure in the cluster was (re)clustered.
	// It contains one entry for each failure in the cluster that has
	// had its clusters changed.
	Updates []*FailureUpdate
}

Update describes changes made to the clustering of a chunk.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
algorithms
failurereason Package failurereason contains the failure reason clustering algorithm for LUCI Analysis.	Package failurereason contains the failure reason clustering algorithm for LUCI Analysis.
rulesalgorithm
testname Package testname contains the test name-based clustering algorithm for LUCI Analysis.	Package testname contains the test name-based clustering algorithm for LUCI Analysis.
testname/rules Package rules provides methods to evaluate test name clustering rules.	Package rules provides methods to evaluate test name clustering rules.
chunkstore
ingestion
proto
reclustering
orchestrator
rules Package rules contains methods to read and write failure association rules.	Package rules contains methods to read and write failure association rules.
cache
exporter Package exporter provides methods to interact with the failure_assocation_rules BigQuery table.	Package exporter provides methods to interact with the failure_assocation_rules BigQuery table.
lang Package lang parses failure association rule predicates.	Package lang parses failure association rule predicates.
runs
shards Package shards provides methods to access the ReclusteringShards Spanner table.	Package shards provides methods to access the ReclusteringShards Spanner table.
state

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL