clustering2

package
v0.0.0-...-4b8fb4f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2025 License: BSD-3-Clause Imports: 16 Imported by: 2

Documentation

Index

Constants

View Source
const (

	// K is the k in k-means.
	K = 50

	// MAX_KMEANS_ITERATIONS is the maximum number of k-means iterations to run.
	MAX_KMEANS_ITERATIONS = 100

	// KMEAN_EPSILON is the smallest change in the k-means total error we will
	// accept per iteration.  If the change in error falls below KMEAN_EPSILON
	// the iteration will terminate.
	KMEAN_EPSILON = 1.0
)

Variables

This section is empty.

Functions

func SortValuePercentSlice

func SortValuePercentSlice(arr []ValuePercent)

SortValuePercentSlice the slice of ValuePercent in a way that's useful to humans.

Types

type ClusterSummaries

type ClusterSummaries struct {
	Clusters        []*ClusterSummary
	StdDevThreshold float32
	K               int
}

ClusterSummaries is one summary for each cluster that the k-means clustering found.

func CalculateClusterSummaries

func CalculateClusterSummaries(ctx context.Context, df *dataframe.DataFrame, k int, stddevThreshold float32, progress Progress, interesting float32, stepDetection types.StepDetection) (*ClusterSummaries, error)

CalculateClusterSummaries runs k-means clustering over the trace shapes.

type ClusterSummary

type ClusterSummary struct {
	// Centroid is the calculated centroid of the cluster.
	Centroid []float32 `json:"centroid"`

	// Keys of all the members of the Cluster.
	//
	// The keys are sorted so that the ones at the beginning of the list are
	// closest to the centroid.
	//
	// Note: This value is not serialized to JSON.
	Keys []string `json:"-"`

	// Shortcut is the id of a shortcut for the above Keys.
	Shortcut string `json:"shortcut"`

	// ParamSummaries is a summary of all the parameters in the cluster.
	ParamSummaries []ValuePercent `json:"param_summaries2"`

	// StepFit is info on the fit of the centroid to a step function.
	StepFit *stepfit.StepFit `json:"step_fit"`

	// StepPoint is the ColumnHeader for the step point.
	StepPoint *dataframe.ColumnHeader `json:"step_point"`

	// Num is the number of observations that are in this cluster.
	Num int `json:"num"`

	// Timestamp is the timestamp when this regression was found.
	Timestamp time.Time `json:"ts"`

	// NotificationID is the ID of the notification sent for this regression.
	// Will be the empty string if no notification has been sent.
	NotificationID string `json:"notification_id,omitempty"`
}

ClusterSummary is a summary of a single cluster of traces.

func NewClusterSummary

func NewClusterSummary(ctx context.Context) *ClusterSummary

NewClusterSummary returns a new ClusterSummary.

type Progress

type Progress func(totalError float64)

Progress is a function that is called periodically with the progress being made in clustering.

type ValuePercent

type ValuePercent struct {
	// Value is the key value pair, e.g. "config=8888".
	Value string `json:"value"`

	// Percent is a percentage as an int, i.e. 80% is represented as 80.
	Percent int `json:"percent"`
}

ValuePercent is a weight proportional to the number of times the key=value appears in a cluster. Used in ClusterSummary.

func GetParamSummariesForKeys

func GetParamSummariesForKeys(keys []string) []ValuePercent

GetParamSummariesForKeys summarizes all the parameters for all observations in a cluster.

The return value is an array of []ValueWeight's, one []ValueWeight per parameter. The members of each []ValueWeight are sorted by the Weight, with higher Weight's first.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL