elkans

package
v1.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 2, 2024 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func L2Distance

func L2Distance(v1, v2 *mat.VecDense) float64

L2Distance is used for L2Distance distance in Euclidean Kmeans.

func NewKMeans

func NewKMeans(vectors [][]float64, clusterCnt,
	maxIterations int, deltaThreshold float64,
	distanceType kmeans.DistanceType, initType kmeans.InitType,
) (kmeans.Clusterer, error)

func SphericalDistance added in v1.1.1

func SphericalDistance(v1, v2 *mat.VecDense) float64

SphericalDistance is used for InnerProduct and CosineDistance in Spherical Kmeans. NOTE: spherical distance between two points on a sphere is equal to the angular distance between the two points, scaled by pi. Refs: https://en.wikipedia.org/wiki/Great-circle_distance#Vector_version

Types

type ElkanClusterer

type ElkanClusterer struct {
	// contains filtered or unexported fields
}

ElkanClusterer is an improved kmeans algorithm which using the triangle inequality to reduce the number of distance calculations. As quoted from the paper: "The main contribution of this paper is an optimized version of the standard k-means method, with which the number of distance computations is in practice closer to `n` than to `nke`, where n is the number of vectors, k is the number of centroids, and e is the number of iterations needed until convergence."

However, during each iteration of the algorithm, the lower bounds l(x, c) are updated for all points x and centers c. These updates take O(nk) time, so the complexity of the algorithm remains at least O(nke), even though the number of distance calculations is roughly O(n) only. NOTE that, distance calculation is very expensive for higher dimension vectors.

Ref Paper: https://cdn.aaai.org/ICML/2003/ICML03-022.pdf

func (*ElkanClusterer) Cluster

func (km *ElkanClusterer) Cluster() ([][]float64, error)

Cluster returns the final centroids and the error if any.

func (*ElkanClusterer) InitCentroids

func (km *ElkanClusterer) InitCentroids()

InitCentroids initializes the centroids using initialization algorithms like random or kmeans++. Right now, we don't support kmeans++ since it is very slow for large scale clustering.

func (*ElkanClusterer) Normalize

func (km *ElkanClusterer) Normalize()

Normalize is required for spherical kmeans initialization.

func (*ElkanClusterer) SSE

func (km *ElkanClusterer) SSE() float64

SSE returns the sum of squared errors.

type Initializer

type Initializer interface {
	InitCentroids(vectors []*mat.VecDense, k int) (centroids []*mat.VecDense)
}

func NewRandomInitializer

func NewRandomInitializer() Initializer

type Random

type Random struct {
	// contains filtered or unexported fields
}

Random initializes the centroids with random centroids from the vector list. As mentioned in the FAISS discussion here https://github.com/facebookresearch/faiss/issues/268#issuecomment-348184505 "We have not implemented Kmeans++ in Faiss, because with our former Yael library, which implements both k-means++ and regular random initialization, we observed that the overhead computational cost was not worth the saving (negligible) in all large-scale settings we have considered."

func (*Random) InitCentroids

func (r *Random) InitCentroids(vectors []*mat.VecDense, k int) (centroids []*mat.VecDense)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL