minhash

package
v0.0.0-...-1614a31 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 20, 2018 License: MIT Imports: 6 Imported by: 0

README

Min-Hash

Introduction:

MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are

Installation

To get the package use the standard:

go get github.com/bsc-s2/gokit/minhash

Usage

calc similarity of two sets

To create two MinHashes with 128 buckets:

h0 := minhash.New(128)

h1 := minhash.New(128)

Add elements from set0 & set1:

for i := range set0 {
    h0.Add(set0[i])
}

for i := range set1 {
    h1.Add(set0[i])
}

Get similarity of two sets by MinHash(h0&h1):

similarity, err := h0.GetSimilarity(h1)

Documentation

See the associated GoDoc

Example

See func TestMinHashError(t *testing.T) in minhash_test.go

More Details

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type MinHash

type MinHash struct {
	// contains filtered or unexported fields
}

MinHash stores a Set of min hashes(per bucket has one min hash)

func New

func New(bucketCnt int) *MinHash

New creates a new MinHash with bucketCnt

func (*MinHash) Add

func (h *MinHash) Add(element []byte)

Add a element in MinHash

func (*MinHash) AddString

func (h *MinHash) AddString(element string)

func (*MinHash) Get

func (h *MinHash) Get(pos int) uint64

Get value from MinHash by position

func (*MinHash) GetSignature

func (h *MinHash) GetSignature() []uint64

GetSignature returns a signature for the set.

func (*MinHash) GetSimilarity

func (h *MinHash) GetSimilarity(h1 *MinHash) (float64, error)

GetSimilarity of two MinHash

func (*MinHash) Marshal

func (h *MinHash) Marshal() (b []byte, err error)

Marshal transfer MinHash to bytes

func (*MinHash) ReadFrom

func (h *MinHash) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads a binary representation of the MinHash (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.

func (*MinHash) Set

func (h *MinHash) Set(pos int, value uint64)

Set value in specified position

func (*MinHash) Unmarshal

func (h *MinHash) Unmarshal(b []byte) (n int64, err error)

Unmarshal transfer bytes to MinHash

func (*MinHash) WriteTo

func (h *MinHash) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes a binary representation of the MinHash to an i/o stream. It returns the number of bytes written.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL