kmeans

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 3, 2024 License: MIT Imports: 5 Imported by: 0

README

codecov Go Report Card

kmeans

provides a modern go implementation of the kmeans clustering algorithm.

This algorithm distinguishes itself from other kmeans clustering packages with use of generic data types and iterators. Using these new language features can make it easier to integrate the clustering with existing code and structs without explicitly duplicating existing objects into arrays.

Use of iterators of observations (e.g. iter.Seq[Observation[float64]]) also provides a mechanism that allows data to be delivered incrementally.

See 3d scatter plot to see a demo visualization. This can be regenerated using go run ./cmd/demo.go -k2 5 ; open scatter3d.html

Usage

Define your observation

Provide a struct that implements Values(i int) T where T can be any Number type.

type person struct {
	name          string
	birthDate     time.Time
	height_inches int
	weight_lbs    int
	gender        int
}

func newPerson(name string, birthDate string, height int, weight int, gender int) person {
	d, _ := time.Parse(time.DateOnly, birthDate)
	return person{
		name:          name,
		birthDate:     d,
		height_inches: height,
		weight_lbs:    weight,
		gender:        gender,
	}
}

func (p person) age() float64 {
	return float64(time.Since(p.birthDate) / (time.Hour * 24 * 365))
}


func (p person) Values(i int) float64 {
	switch i {
	case 0:
		return float64(p.weight_lbs)
	case 1:
		return p.age()
	case 2:
		return float64(p.height_inches)
	case 3:
		return float64(p.gender)
	}
	return 0.0
}

Implement your observation collection. This collection is used to populate the clusters

type peopleObservations []person

func (p peopleObservations) Observations() iter.Seq[kmeans.Observation[float64]] {
	return func(yield func(kmeans.Observation[float64]) bool) {
		for _, o := range p {
			if !yield(o.(kmeans.Observation[float64])) {
				return
			}
		}
	}
}

func (p peopleObservations) Degree() int {
	return 4
}

Build ClusterObservations

    var po personObservations
    // Generate k clusters
	cc, err := OptimizeClusters(k, po)
    largestIndex := cc.LargestIndex()
    // Print biggest cluster
    for o := range cc.Observations.ClusterObservations {
        fmt.Println(o)
    }

Provide additional objects and find the best cluster to associated them with

    p := newPerson("kamala", "1964-10-20", 64, 130, 1)
    clusterIndex := cc.Nearest(p)
    fmt.Println("Best cluster is ", clusterIndex)

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrEmptyObservations error = fmt.Errorf("empty observation, there is no mean for an empty set of points")
View Source
var ErrKMustBeGreaterThanZero = fmt.Errorf("k must be greater than 0")

Functions

func AverageDistance

func AverageDistance[T Number](o Observation[T], observations iter.Seq[Observation[T]], degree int) float64

AverageDistance returns the average distance between o and all observations

func Center

func Center[T Number](os iter.Seq[Observation[T]], degree int) ([]T, error)

func Distance

func Distance[T Number](o1, o2 Observation[T], degree int) float64

Distance returns the euclidean distance between two coordinates

func ObservationRange

func ObservationRange[T Number](oo iter.Seq[Observation[T]], degree int) (observationMins []T, observationMax []T)

ObservationSum returns slices the same size as the Degree of the observation. Each entry in the slice is the min and max of all the values for that slice's index into their `Value()`.

func ObservationSum

func ObservationSum[T Number](oo iter.Seq[Observation[T]], degree int) (sum []T, count int)

ObservationSum returns a slice the same size as the Degree of the observation. Each entry in the slice is the sum of all the values for that slice's index into their `Value()`. The count of the number of entries are also returned.

Types

type Cluster

type Cluster[T Number] struct {
	Center       Observation[T]
	Observations *ObservationList[T]
}

A Cluster which data points gravitate around

func (*Cluster[T]) Append

func (c *Cluster[T]) Append(o Observation[T])

Append adds an observation to the Cluster

func (*Cluster[T]) MostCentral

func (c *Cluster[T]) MostCentral() Observation[T]

func (*Cluster[T]) Recenter

func (c *Cluster[T]) Recenter()

Recenter updates the customer center a cluster

func (*Cluster[T]) SumOfDistance

func (c *Cluster[T]) SumOfDistance() float64

SumOfDistance computes the sum of the distance of all the observations from the center of the cluster

type Clusters

type Clusters[T Number] []Cluster[T]

Clusters is a slice of clusters

func New

func New[T Number](k int, dataset Observations[T]) (Clusters[T], error)

New sets up a new set of clusters and randomly seeds their initial positions

func OptimizeClusters

func OptimizeClusters[T Number](k int, dataset Observations[T]) (Clusters[T], error)

New sets up a new set of clusters and randomly seeds their initial positions

func (Clusters[T]) Largest

func (c Clusters[T]) Largest() int

func (Clusters[T]) Nearest

func (c Clusters[T]) Nearest(point Observation[T]) int

Nearest returns the index of the cluster nearest to point

func (Clusters[T]) Neighbor

func (c Clusters[T]) Neighbor(point Observation[T], fromCluster int) (int, float64)

Neighbor returns the neighboring cluster of a point along with the average distance to its points

func (Clusters[T]) Recenter

func (c Clusters[T]) Recenter()

Recenter updates all cluster centers

func (Clusters[T]) Reset

func (c Clusters[T]) Reset()

Reset clears all point assignments

func (Clusters[T]) Smallest

func (c Clusters[T]) Smallest() int

func (Clusters[T]) SumClusterVariance

func (c Clusters[T]) SumClusterVariance() float64

type NormalizeObservationAdapter

type NormalizeObservationAdapter[T Number] struct {
	// contains filtered or unexported fields
}

func NewNormalizeObservationAdapter

func NewNormalizeObservationAdapter[T Number](oo Observations[T], scale []T) *NormalizeObservationAdapter[T]

func (*NormalizeObservationAdapter[T]) Degree

func (n *NormalizeObservationAdapter[T]) Degree() int

func (NormalizeObservationAdapter[T]) Denormalize

func (n NormalizeObservationAdapter[T]) Denormalize(o Observation[T]) []T

func (NormalizeObservationAdapter[T]) Normalize

func (n NormalizeObservationAdapter[T]) Normalize(ov Observation[T]) []T

Normalize transforming the observation vector to values between 0 and 1 based on the initial population of data provided. It will also apply scaling

func (NormalizeObservationAdapter[T]) Observations

func (n NormalizeObservationAdapter[T]) Observations() iter.Seq[Observation[T]]

type NormalizedObservation

type NormalizedObservation[T, O Number] struct {
	Original *Observation[O]
	// contains filtered or unexported fields
}

NormalizedObservation makes a copy of the provided Observation maintaining a reference to the original unmodified observation without.

func (NormalizedObservation[T, O]) Values

func (o NormalizedObservation[T, O]) Values(i int) T

type Number

type Number interface {
	int | int8 | int16 | int32 | int64 |
		uint | uint8 | uint16 | uint32 | uint64 |
		float32 | float64
}

Number provides a set of types used in the generic observation

type Observation

type Observation[T Number] interface {
	Values(i int) T
}

Observations must return an array of values where the length and meaning of each array is the same for all observations in a set. Users may either implement the observation interface or create a type or struct that wrap it Values may be copied but are never modified

func SelectRandomObservations

func SelectRandomObservations[T Number](oo Observations[T], k int) []Observation[T]

type ObservationList

type ObservationList[T Number] struct {
	ClusterObservations []Observation[T]
	// contains filtered or unexported fields
}

func NewObservationList

func NewObservationList[T Number](degree int) *ObservationList[T]

func (*ObservationList[T]) All

func (o *ObservationList[T]) All() iter.Seq[Observation[T]]

func (*ObservationList[T]) Append

func (o *ObservationList[T]) Append(v Observation[T])

func (*ObservationList[T]) Degree

func (o *ObservationList[T]) Degree() int

type Observations

type Observations[T Number] interface {
	Observations() iter.Seq[Observation[T]]
	// Degree is the number of values in each observation
	Degree() int
}

Observations is a collection of Observation objects which may be provided as a sequence to prevent the requirement that caller implementor use a slice

func NormalizeObservations

func NormalizeObservations[O Number](oo Observations[O]) Observations[float64]

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL