clust

package
v1.1.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2023 License: BSD-3-Clause Imports: 11 Imported by: 10

README

clust

Docs: GoDoc

clust implements agglomerative clustering of items based on simat similarity matrix data.

GlomClust is the main function, taking different DistFunc options for comparing distance between items.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var KiT_StdDists = kit.Enums.AddEnum(StdDistsN, kit.NotBitFlag, nil)

Functions

func AvgDist

func AvgDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

AvgDist is the average-distance or average-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot].

func ContrastDist

func ContrastDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

ContrastDist computes maxd + (average within distance - average between distance) for two clusters a and b, given by their list of indexes. avg between is average distance between all items in a & b versus all outside that. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot]. maxd is the maximum distance and is needed to ensure distances are positive.

func MaxDist

func MaxDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

MaxDist is the maximum-distance or complete-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot].

func MinDist

func MinDist(aix, bix []int, ntot int, maxd float64, smat []float64) float64

MinDist is the minimum-distance or single-linkage weighting function for comparing two clusters a and b, given by their list of indexes. ntot is total number of nodes, and smat is the square similarity matrix [ntot x ntot].

func Plot

func Plot(pt *etable.Table, root *Node, smat *simat.SimMat)

Plot sets the rows of given data table to trace out lines with labels that will render cluster plot starting at root node when plotted with a standard plotting package. The lines double-back on themselves to form a continuous line to be plotted.

Types

type DistFunc

type DistFunc func(aix, bix []int, ntot int, maxd float64, smat []float64) float64

DistFunc is a clustering distance function that evaluates aggregate distance between nodes, given the indexes of leaves in a and b clusters which are indexs into an ntot x ntot similarity (distance) matrix smat. maxd is the maximum distance value in the smat, which is needed by the ContrastDist function and perhaps others.

func StdFunc added in v1.0.16

func StdFunc(std StdDists) DistFunc

StdFunc returns a standard distance function as specified

type Node

type Node struct {
	Idx     int     `desc:"index into original distance matrix -- only valid for for terminal leaves"`
	Dist    float64 `` /* 130-byte string literal not displayed */
	ParDist float64 `desc:"total aggregate distance from parents -- the X axis offset at which our cluster starts"`
	Y       float64 `desc:"y-axis value for this node -- if a parent, it is the average of its kids Y's, otherwise it counts down"`
	Kids    []*Node `desc:"child nodes under this one"`
}

Node is one node in the cluster

func Glom

func Glom(smat *simat.SimMat, dfunc DistFunc) *Node

Glom implements basic agglomerative clustering, based on a raw similarity matrix as given. This calls GlomInit to initialize the root node with all of the leaves, and the calls GlomClust to do the iterative clustering process. If you want to start with pre-defined initial clusters, then call GlomClust with a root node so-initialized. The smat.Mat matrix must be an etensor.Float64.

func GlomClust

func GlomClust(root *Node, smat *simat.SimMat, dfunc DistFunc) *Node

GlomClust does the iterative agglomerative clustering, based on a raw similarity matrix as given, using a root node that has already been initialized with the starting clusters (all of the leaves by default, but could be anything if you want to start with predefined clusters). The smat.Mat matrix must be an etensor.Float64.

func GlomInit

func GlomInit(ntot int) *Node

GlomInit returns a standard root node initialized with all of the leaves

func GlomStd added in v1.0.16

func GlomStd(smat *simat.SimMat, std StdDists) *Node

GlomStd implements basic agglomerative clustering, based on a raw similarity matrix as given. This calls GlomInit to initialize the root node with all of the leaves, and the calls GlomClust to do the iterative clustering process. If you want to start with pre-defined initial clusters, then call GlomClust with a root node so-initialized. The smat.Mat matrix must be an etensor.Float64. Std version uses std distance functions

func NewNode

func NewNode(na, nb *Node, dst float64) *Node

NewNode merges two nodes into a new node

func (*Node) Idxs

func (nn *Node) Idxs(ix []int, ctr *int)

Idxs collects all the indexes in this node

func (*Node) IsLeaf

func (nn *Node) IsLeaf() bool

IsLeaf returns true if node is a leaf of the tree with no kids

func (*Node) Plot

func (nn *Node) Plot(pt *etable.Table, smat *simat.SimMat)

Plot sets the rows of given data table to trace out lines with labels that will render this node in a cluster plot when plotted with a standard plotting package. The lines double-back on themselves to form a continuous line to be plotted.

func (*Node) SetParDist

func (nn *Node) SetParDist(pard float64)

SetParDist sets the parent distance for the nodes in preparation for plotting.

func (*Node) SetYs

func (nn *Node) SetYs(nextY *float64)

SetYs sets the Y-axis values for the nodes in preparation for plotting.

func (*Node) Sprint

func (nn *Node) Sprint(smat *simat.SimMat, depth int) string

Sprint prints to string

type StdDists added in v1.0.16

type StdDists int

StdDists are standard clustering distance functions

const (
	// Min is the minimum-distance or single-linkage weighting function
	Min StdDists = iota

	// Max is the maximum-distance or complete-linkage weighting function
	Max

	// Avg is the average-distance or average-linkage weighting function
	Avg

	// Contrast computes maxd + (average within distance - average between distance)
	Contrast

	StdDistsN
)

func (*StdDists) FromString added in v1.0.16

func (i *StdDists) FromString(s string) error

func (StdDists) MarshalJSON added in v1.0.16

func (ev StdDists) MarshalJSON() ([]byte, error)

func (StdDists) String added in v1.0.16

func (i StdDists) String() string

func (*StdDists) UnmarshalJSON added in v1.0.16

func (ev *StdDists) UnmarshalJSON(b []byte) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL