cluster

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2025 License: BSD-3-Clause Imports: 9 Imported by: 0

README

cluster

cluster implements agglomerative clustering of items based on metric distance Matrix data (which is provided as an input, and must have been generated with a distance-like metric (increasing with dissimiliarity).

There are different standard ways of accumulating the aggregate distance of a node based on its leaves:

  • Min: the minimum-distance across leaves, i.e., the single-linkage weighting function.
  • Max: the maximum-distance across leaves, i.e,. the complete-linkage weighting function.
  • Avg: the average-distance across leaves, i.e., the average-linkage weighting function.
  • Contrast: is Max + (average within distance - average between distance).

GlomCluster is the main function, taking different ClusterFunc options for comparing distance between items.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var Funcs map[string]MetricFunc

Funcs is a registry of clustering metric functions, initialized with the standard options.

Functions

func AvgFunc

func AvgFunc(aix, bix []int, ntot int, maxd float64, dmat tensor.Tensor) float64

AvgFunc is the average-distance or average-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and dmat is the square similarity matrix [ntot x ntot].

func Call

func Call(funcName string, aix, bix []int, ntot int, maxd float64, dmat tensor.Tensor) float64

Call calls a cluster metric function by name.

func ContrastFunc

func ContrastFunc(aix, bix []int, ntot int, maxd float64, dmat tensor.Tensor) float64

ContrastFunc computes maxd + (average within distance - average between distance) for two clusters a and b, given by their list of indexes. avg between is average distance between all items in a & b versus all outside that. ntot is total number of nodes, and dmat is the square similarity matrix [ntot x ntot]. maxd is the maximum distance and is needed to ensure distances are positive.

func MaxFunc

func MaxFunc(aix, bix []int, ntot int, maxd float64, dmat tensor.Tensor) float64

MaxFunc is the maximum-distance or complete-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and dmat is the square similarity matrix [ntot x ntot].

func MinFunc

func MinFunc(aix, bix []int, ntot int, maxd float64, dmat tensor.Tensor) float64

MinFunc is the minimum-distance or single-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and dmat is the square similarity matrix [ntot x ntot].

func Plot

func Plot(pt *table.Table, root *Node, dmat, labels tensor.Tensor)

Plot sets the rows of given data table to trace out lines with labels that will render cluster plot starting at root node when plotted with a standard plotting package. The lines double-back on themselves to form a continuous line to be plotted.

Types

type MetricFunc

type MetricFunc func(aix, bix []int, ntot int, maxd float64, dmat tensor.Tensor) float64

MetricFunc is a clustering distance metric function that evaluates aggregate distance between nodes, given the indexes of leaves in a and b clusters which are indexs into an ntot x ntot distance matrix dmat. maxd is the maximum distance value in the dmat, which is needed by the ContrastDist function and perhaps others.

type Metrics

type Metrics int32 //enums:enum

Metrics are standard clustering distance metric functions, specifying how a node computes its distance based on its leaves.

const (
	// Min is the minimum-distance or single-linkage weighting function.
	Min Metrics = iota

	// Max is the maximum-distance or complete-linkage weighting function.
	Max

	// Avg is the average-distance or average-linkage weighting function.
	Avg

	// Contrast computes maxd + (average within distance - average between distance).
	Contrast
)
const MetricsN Metrics = 4

MetricsN is the highest valid value for type Metrics, plus one.

func MetricsValues

func MetricsValues() []Metrics

MetricsValues returns all possible values for the type Metrics.

func (Metrics) Desc

func (i Metrics) Desc() string

Desc returns the description of the Metrics value.

func (Metrics) Int64

func (i Metrics) Int64() int64

Int64 returns the Metrics value as an int64.

func (Metrics) MarshalText

func (i Metrics) MarshalText() ([]byte, error)

MarshalText implements the encoding.TextMarshaler interface.

func (*Metrics) SetInt64

func (i *Metrics) SetInt64(in int64)

SetInt64 sets the Metrics value from an int64.

func (*Metrics) SetString

func (i *Metrics) SetString(s string) error

SetString sets the Metrics value from its string representation, and returns an error if the string is invalid.

func (Metrics) String

func (i Metrics) String() string

String returns the string representation of this Metrics value.

func (*Metrics) UnmarshalText

func (i *Metrics) UnmarshalText(text []byte) error

UnmarshalText implements the encoding.TextUnmarshaler interface.

func (Metrics) Values

func (i Metrics) Values() []enums.Enum

Values returns all possible values for the type Metrics.

type Node

type Node struct {
	// index into original distance matrix; only valid for for terminal leaves.
	Index int

	// Distance value for this node, i.e., how far apart were all the kids from
	// each other when this node was created. is 0 for leaf nodes
	Dist float64

	// ParDist is total aggregate distance from parents; The X axis offset at which our cluster starts.
	ParDist float64

	// Y is y-axis value for this node; if a parent, it is the average of its kids Y's,
	// otherwise it counts down.
	Y float64

	// Kids are child nodes under this one.
	Kids []*Node
}

Node is one node in the cluster

func Cluster

func Cluster(funcName string, dmat, labels tensor.Tensor) *Node

Cluster implements agglomerative clustering, based on a distance matrix dmat, e.g., as computed by metric.Matrix method, using a metric that increases in value with greater dissimilarity. labels provides an optional String tensor list of labels for the elements of the distance matrix. This calls InitAllLeaves to initialize the root node with all of the leaves, and then Glom to do the iterative agglomerative clustering process. If you want to start with pre-defined initial clusters, then call Glom with a root node so-initialized.

func Glom

func Glom(root *Node, funcName string, dmat tensor.Tensor) *Node

Glom does the iterative agglomerative clustering, based on a raw similarity matrix as given, using a root node that has already been initialized with the starting clusters, which is all of the leaves by default, but could be anything if you want to start with predefined clusters.

func InitAllLeaves

func InitAllLeaves(ntot int) *Node

InitAllLeaves returns a standard root node initialized with all of the leaves.

func NewNode

func NewNode(na, nb *Node, dst float64) *Node

NewNode merges two nodes into a new node

func (*Node) Indexes

func (nn *Node) Indexes(ix []int, ctr *int)

Indexes collects all the indexes in this node

func (*Node) IsLeaf

func (nn *Node) IsLeaf() bool

IsLeaf returns true if node is a leaf of the tree with no kids

func (*Node) Plot

func (nn *Node) Plot(pt *table.Table, dmat, labels tensor.Tensor)

Plot sets the rows of given data table to trace out lines with labels that will render this node in a cluster plot when plotted with a standard plotting package. The lines double-back on themselves to form a continuous line to be plotted.

func (*Node) SetParDist

func (nn *Node) SetParDist(pard float64)

SetParDist sets the parent distance for the nodes in preparation for plotting.

func (*Node) SetYs

func (nn *Node) SetYs(nextY *float64)

SetYs sets the Y-axis values for the nodes in preparation for plotting.

func (*Node) Sprint

func (nn *Node) Sprint(labels tensor.Tensor, depth int) string

Sprint prints to string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL