minhash

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 22, 2024 License: MIT Imports: 7 Imported by: 1

Documentation

Overview

Package minhash provides a min-hash collection for approximating Jaccard similarity.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type MinHash

type MinHash[T constraints.Integer] struct {
	// contains filtered or unexported fields
}

A MinHash is a min-hash collection. Retains the k lowest unique values out of all the values that were added to it.

func New

func New[T constraints.Integer](k int) *MinHash[T]

New returns an empty collection that stores k values.

func (*MinHash[T]) Frozen added in v0.6.0

func (mh *MinHash[T]) Frozen() *MinHash[T]

Frozen returns an immutable version of this instance. The original instance is unchanged.

Frozen instances are sorted, take up less memory and calculate Jaccard faster. Calls to View are slower because the data is cloned.

func (*MinHash[T]) Jaccard

func (mh *MinHash[T]) Jaccard(other *MinHash[T]) float64

Jaccard returns the approximated Jaccard similarity between mh and other.

Sort needs to be called before calling this function.

func (*MinHash[T]) K

func (mh *MinHash[T]) K() int

K returns the maximal number of elements in mh.

func (*MinHash[T]) MarshalJSON

func (mh *MinHash[T]) MarshalJSON() ([]byte, error)

MarshalJSON implements the json.Marshaler interface.

func (*MinHash[T]) N

func (mh *MinHash[T]) N() int

N returns the number of calls that were made to Push. Represents the size of the original set.

func (*MinHash[T]) Push

func (mh *MinHash[T]) Push(x T) bool

Push tries to add a hash to the collection. x is added only if it does not already exist, and there are less than k elements lesser than x. Returns true if x was added and false if not.

func (*MinHash[T]) SoftJaccard

func (mh *MinHash[T]) SoftJaccard(other *MinHash[T]) float64

SoftJaccard returns the Jaccard similarity between mh and other, adding one agreed upon element and one disagreed upon element to the calculation.

Sort needs to be called before calling this function.

func (*MinHash[T]) Sort

func (mh *MinHash[T]) Sort()

Sort sorts the collection, making it ready for Jaccard calculation. The collection is still valid after calling Sort.

func (*MinHash[T]) UnmarshalJSON

func (mh *MinHash[T]) UnmarshalJSON(b []byte) error

UnmarshalJSON implements the json.Unmarshaler interface.

func (*MinHash[T]) View

func (mh *MinHash[T]) View() []T

View returns the underlying slice of values.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL