clust

package
v0.3.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 19, 2024 License: BSD-3-Clause Imports: 9 Imported by: 2

README

clust

clust implements agglomerative clustering of items based on simat similarity matrix data.

GlomClust is the main function, taking different DistFunc options for comparing distance between items.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AvgDist

func AvgDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

AvgDist is the average-distance or average-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot].

func ContrastDist

func ContrastDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

ContrastDist computes maxd + (average within distance - average between distance) for two clusters a and b, given by their list of indexes. avg between is average distance between all items in a & b versus all outside that. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot]. maxd is the maximum distance and is needed to ensure distances are positive.

func MaxDist

func MaxDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

MaxDist is the maximum-distance or complete-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot].

func MinDist

func MinDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

MinDist is the minimum-distance or single-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot].

func Plot

func Plot(pt *table.Table, root *Node, smat *simat.SimMat)

Plot sets the rows of given data table to trace out lines with labels that will render cluster plot starting at root node when plotted with a standard plotting package. The lines double-back on themselves to form a continuous line to be plotted.

Types

type DistFunc

type DistFunc func(aix, bix []int, ntot int, maxd float64, smat []float64) float64

DistFunc is a clustering distance function that evaluates aggregate distance between nodes, given the indexes of leaves in a and b clusters which are indexs into an ntot x ntot similarity (distance) matrix smat. maxd is the maximum distance value in the smat, which is needed by the ContrastDist function and perhaps others.

func StdFunc

func StdFunc(std StdDists) DistFunc

StdFunc returns a standard distance function as specified

type Node

type Node struct {

	// index into original distance matrix -- only valid for for terminal leaves
	Index int

	// distance for this node -- how far apart were all the kids from each other when this node was created -- is 0 for leaf nodes
	Dist float64

	// total aggregate distance from parents -- the X axis offset at which our cluster starts
	ParDist float64

	// y-axis value for this node -- if a parent, it is the average of its kids Y's, otherwise it counts down
	Y float64

	// child nodes under this one
	Kids []*Node
}

Node is one node in the cluster

func Glom

func Glom(smat *simat.SimMat, dfunc DistFunc) *Node

Glom implements basic agglomerative clustering, based on a raw similarity matrix as given. This calls GlomInit to initialize the root node with all of the leaves, and the calls GlomClust to do the iterative clustering process. If you want to start with pre-defined initial clusters, then call GlomClust with a root node so-initialized. The smat.Mat matrix must be an tensor.Float64.

func GlomClust

func GlomClust(root *Node, smat *simat.SimMat, dfunc DistFunc) *Node

GlomClust does the iterative agglomerative clustering, based on a raw similarity matrix as given, using a root node that has already been initialized with the starting clusters (all of the leaves by default, but could be anything if you want to start with predefined clusters). The smat.Mat matrix must be an tensor.Float64.

func GlomInit

func GlomInit(ntot int) *Node

GlomInit returns a standard root node initialized with all of the leaves

func GlomStd

func GlomStd(smat *simat.SimMat, std StdDists) *Node

GlomStd implements basic agglomerative clustering, based on a raw similarity matrix as given. This calls GlomInit to initialize the root node with all of the leaves, and the calls GlomClust to do the iterative clustering process. If you want to start with pre-defined initial clusters, then call GlomClust with a root node so-initialized. The smat.Mat matrix must be an tensor.Float64. Std version uses std distance functions

func NewNode

func NewNode(na, nb *Node, dst float64) *Node

NewNode merges two nodes into a new node

func (*Node) Indexes

func (nn *Node) Indexes(ix []int, ctr *int)

Indexes collects all the indexes in this node

func (*Node) IsLeaf

func (nn *Node) IsLeaf() bool

IsLeaf returns true if node is a leaf of the tree with no kids

func (*Node) Plot

func (nn *Node) Plot(pt *table.Table, smat *simat.SimMat)

Plot sets the rows of given data table to trace out lines with labels that will render this node in a cluster plot when plotted with a standard plotting package. The lines double-back on themselves to form a continuous line to be plotted.

func (*Node) SetParDist

func (nn *Node) SetParDist(pard float64)

SetParDist sets the parent distance for the nodes in preparation for plotting.

func (*Node) SetYs

func (nn *Node) SetYs(nextY *float64)

SetYs sets the Y-axis values for the nodes in preparation for plotting.

func (*Node) Sprint

func (nn *Node) Sprint(smat *simat.SimMat, depth int) string

Sprint prints to string

type StdDists

type StdDists int32 //enums:enum

StdDists are standard clustering distance functions

const (
	// Min is the minimum-distance or single-linkage weighting function
	Min StdDists = iota

	// Max is the maximum-distance or complete-linkage weighting function
	Max

	// Avg is the average-distance or average-linkage weighting function
	Avg

	// Contrast computes maxd + (average within distance - average between distance)
	Contrast
)
const StdDistsN StdDists = 4

StdDistsN is the highest valid value for type StdDists, plus one.

func StdDistsValues

func StdDistsValues() []StdDists

StdDistsValues returns all possible values for the type StdDists.

func (StdDists) Desc

func (i StdDists) Desc() string

Desc returns the description of the StdDists value.

func (StdDists) Int64

func (i StdDists) Int64() int64

Int64 returns the StdDists value as an int64.

func (StdDists) MarshalText

func (i StdDists) MarshalText() ([]byte, error)

MarshalText implements the encoding.TextMarshaler interface.

func (*StdDists) SetInt64

func (i *StdDists) SetInt64(in int64)

SetInt64 sets the StdDists value from an int64.

func (*StdDists) SetString

func (i *StdDists) SetString(s string) error

SetString sets the StdDists value from its string representation, and returns an error if the string is invalid.

func (StdDists) String

func (i StdDists) String() string

String returns the string representation of this StdDists value.

func (*StdDists) UnmarshalText

func (i *StdDists) UnmarshalText(text []byte) error

UnmarshalText implements the encoding.TextUnmarshaler interface.

func (StdDists) Values

func (i StdDists) Values() []enums.Enum

Values returns all possible values for the type StdDists.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL