elkans

package

v1.1.0 Latest Latest Go to latest Published: Dec 29, 2023 License: Apache-2.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/matrixorigin/matrixone

Links

Open Source Insights

Documentation ¶

Index ¶

func L2Distance(v1, v2 *mat.VecDense) float64
func NewKMeans(vectors [][]float64, clusterCnt, maxIterations int, deltaThreshold float64, ...) (kmeans.Clusterer, error)
type ElkanClusterer
type Initializer
- func NewKMeansPlusPlusInitializer(distFn kmeans.DistanceFunction) Initializer
- func NewRandomInitializer() Initializer
type KMeansPlusPlus
- func (kpp *KMeansPlusPlus) InitCentroids(vectors []*mat.VecDense, k int) (centroids []*mat.VecDense)
type Random
- func (r *Random) InitCentroids(vectors []*mat.VecDense, k int) (centroids []*mat.VecDense)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func L2Distance ¶

func L2Distance(v1, v2 *mat.VecDense) float64

L2Distance is used for L2Distance distance in Euclidean Kmeans.

func NewKMeans ¶

func NewKMeans(vectors [][]float64, clusterCnt,
	maxIterations int, deltaThreshold float64,
	distanceType kmeans.DistanceType, initType kmeans.InitType,
) (kmeans.Clusterer, error)

Types ¶

type ElkanClusterer ¶

type ElkanClusterer struct {
	// contains filtered or unexported fields
}

ElkanClusterer is an improved kmeans algorithm which using the triangle inequality to reduce the number of distance calculations. As quoted from the paper: "The main contribution of this paper is an optimized version of the standard k-means method, with which the number of distance computations is in practice closer to `n` than to `nke`, where n is the number of vectors, k is the number of centroids, and e is the number of iterations needed until convergence."

However, during each iteration of the algorithm, the lower bounds l(x, c) are updated for all points x and centers c. These updates take O(nk) time, so the complexity of the algorithm remains at least O(nke), even though the number of distance calculations is roughly O(n) only. NOTE that, distance calculation is very expensive for higher dimension vectors.

Ref Paper: https://cdn.aaai.org/ICML/2003/ICML03-022.pdf

func (*ElkanClusterer) Cluster ¶

func (km *ElkanClusterer) Cluster() ([][]float64, error)

Cluster returns the final centroids and the error if any.

func (*ElkanClusterer) InitCentroids ¶

func (km *ElkanClusterer) InitCentroids() error

InitCentroids initializes the centroids using initialization algorithms like random or kmeans++.

func (*ElkanClusterer) Normalize ¶

func (km *ElkanClusterer) Normalize()

Normalize is required for spherical kmeans initialization.

func (*ElkanClusterer) SSE ¶

func (km *ElkanClusterer) SSE() float64

SSE returns the sum of squared errors.

type Initializer ¶

type Initializer interface {
	InitCentroids(vectors []*mat.VecDense, k int) (centroids []*mat.VecDense)
}

func NewKMeansPlusPlusInitializer ¶

func NewKMeansPlusPlusInitializer(distFn kmeans.DistanceFunction) Initializer

func NewRandomInitializer ¶

func NewRandomInitializer() Initializer

type KMeansPlusPlus ¶

type KMeansPlusPlus struct {
	// contains filtered or unexported fields
}

KMeansPlusPlus initializes the centroids using kmeans++ algorithm. Complexity: O(k*n*k); n = number of vectors, k = number of clusters Ref Paper: https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf The reason why we have kmeans++ is that it is more stable than random initialization. For example, we have 3 clusters. Using random, we could get 3 centroids: 1&2 which are close to each other and part of cluster 1. 3 is in the middle of 2&3. Using kmeans++, we are sure that 3 centroids are farther away from each other.

func (*KMeansPlusPlus) InitCentroids ¶

func (kpp *KMeansPlusPlus) InitCentroids(vectors []*mat.VecDense, k int) (centroids []*mat.VecDense)

type Random ¶

type Random struct {
	// contains filtered or unexported fields
}

Random initializes the centroids with random centroids from the vector list.

func (*Random) InitCentroids ¶

func (r *Random) InitCentroids(vectors []*mat.VecDense, k int) (centroids []*mat.VecDense)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL