Documentation
¶
Overview ¶
Package minhash implements the bottom-k sketch for streaming set similarity.
For more information,
http://research.neustar.biz/2012/07/09/sketch-of-the-day-k-minimum-values/ MinHashing: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf https://en.wikipedia.org/wiki/MinHash BottomK: http://www.math.tau.ac.il/~haimk/papers/p225-cohen.pdf http://cohenwang.org/edith/Papers/metrics394-cohen.pdf http://www.mpi-inf.mpg.de/~rgemulla/publications/beyer07distinct.pdf
This package works best when provided with a strong 64-bit hash function, such as CityHash, Spooky, Murmur3, or SipHash.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func SimilarityBbit ¶
SimilarityBbit computes an estimate for the similarity between two b-bit signatures
Types ¶
type BottomK ¶
type BottomK struct {
// contains filtered or unexported fields
}
BottomK is a bottom-k sketch of a set
func NewBottomK ¶
NewBottomK returns a new BottomK implementation.
func (*BottomK) Cardinality ¶
Cardinality estimates the cardinality of the set
func (*BottomK) Merge ¶
Merge combines the signatures of the second set, creating the signature of their union.
func (*BottomK) Similarity ¶
Similarity computes an estimate for the similarity between the two sets.
type MinWise ¶
type MinWise struct {
// contains filtered or unexported fields
}
MinWise is a collection of minimum hashes for a set
func NewMinWise ¶
NewMinWise returns a new MinWise Hashing implementation
func NewMinWiseFromSignatures ¶
NewMinWiseFromSignatures returns a new MinWise Hashing implementation using a user-provided set of signatures
func (*MinWise) Cardinality ¶
Cardinality estimates the cardinality of the set
func (*MinWise) Merge ¶
Merge combines the signatures of the second set, creating the signature of their union.
func (*MinWise) SignatureBbit ¶
SignatureBbit returns a b-bit reduction of the signature. This will result in unused bits at the high-end of the words if b does not divide 64 evenly.
func (*MinWise) Similarity ¶
Similarity computes an estimate for the similarity between the two sets.