Documentation ¶
Overview ¶
Package minhash provides a min-hash collection for approximating Jaccard similarity.
Index ¶
- type MinHash
- func (mh *MinHash[T]) Frozen() *MinHash[T]
- func (mh *MinHash[T]) Jaccard(other *MinHash[T]) float64
- func (mh *MinHash[T]) K() int
- func (mh *MinHash[T]) MarshalJSON() ([]byte, error)
- func (mh *MinHash[T]) N() int
- func (mh *MinHash[T]) Push(x T) bool
- func (mh *MinHash[T]) SoftJaccard(other *MinHash[T]) float64
- func (mh *MinHash[T]) Sort()
- func (mh *MinHash[T]) UnmarshalJSON(b []byte) error
- func (mh *MinHash[T]) View() []T
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type MinHash ¶
type MinHash[T constraints.Integer] struct { // contains filtered or unexported fields }
A MinHash is a min-hash collection. Retains the k lowest unique values out of all the values that were added to it.
func New ¶
func New[T constraints.Integer](k int) *MinHash[T]
New returns an empty collection that stores k values.
func (*MinHash[T]) Frozen ¶ added in v0.6.0
Frozen returns an immutable version of this instance. The original instance is unchanged.
Frozen instances are sorted, take up less memory and calculate Jaccard faster. Calls to View are slower because the data is cloned.
func (*MinHash[T]) Jaccard ¶
Jaccard returns the approximated Jaccard similarity between mh and other.
Sort needs to be called before calling this function.
func (*MinHash[T]) MarshalJSON ¶
MarshalJSON implements the json.Marshaler interface.
func (*MinHash[T]) N ¶
N returns the number of calls that were made to Push. Represents the size of the original set.
func (*MinHash[T]) Push ¶
Push tries to add a hash to the collection. x is added only if it does not already exist, and there are less than k elements lesser than x. Returns true if x was added and false if not.
func (*MinHash[T]) SoftJaccard ¶
SoftJaccard returns the Jaccard similarity between mh and other, adding one agreed upon element and one disagreed upon element to the calculation.
Sort needs to be called before calling this function.
func (*MinHash[T]) Sort ¶
func (mh *MinHash[T]) Sort()
Sort sorts the collection, making it ready for Jaccard calculation. The collection is still valid after calling Sort.
func (*MinHash[T]) UnmarshalJSON ¶
UnmarshalJSON implements the json.Unmarshaler interface.