minhash

package
v0.0.0-...-cbd5ca3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2023 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package minhash contains implementations of KMV and KHF MinHash algorithms

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type IntHeap

type IntHeap []uint64

IntHeap is a min-heap of uint64s (we're satisfying the heap interface: https://golang.org/pkg/container/heap/)

func (IntHeap) Len

func (IntHeap IntHeap) Len() int

func (IntHeap) Less

func (IntHeap IntHeap) Less(i, j int) bool

the less method is returning the larger value, so that it is at index position 0 in the heap

func (*IntHeap) Pop

func (IntHeap *IntHeap) Pop() interface{}

Pop is a method to remove an element from the heap

func (*IntHeap) Push

func (IntHeap *IntHeap) Push(x interface{})

Push is a method to add an element to the heap

func (IntHeap) Swap

func (IntHeap IntHeap) Swap(i, j int)

type KHFsketch

type KHFsketch struct {
	KmerSize   uint     `json:"ksize"`
	Md5sum     string   `json:"md5sum"`
	Sketch     []uint64 `json:"mins"`
	SketchSize uint     `json:"num"`
	// contains filtered or unexported fields
}

KHFsketch is the K-Hash Functions MinHash sketch of a set

func NewKHFsketch

func NewKHFsketch(k, s uint) *KHFsketch

NewKHFsketch is the constructor for a KHFsketch

func (*KHFsketch) AddHash

func (KHFsketch *KHFsketch) AddHash(hv uint64)

AddHash is a method to evaluate a hash value and add any minimums to the sketch

func (*KHFsketch) GetAlgo

func (KHFsketch *KHFsketch) GetAlgo() string

GetAlgo is a method to return the sketching algorithm used

func (*KHFsketch) GetMD5

func (KHFsketch *KHFsketch) GetMD5() string

GetMD5 is a method to return the MD5 currently calculated for the sketch

func (*KHFsketch) GetSimilarity

func (KHFsketch *KHFsketch) GetSimilarity(mh2 MinHash) (float64, error)

GetSimilarity is a function to estimate the Jaccard similarity between sketches

func (*KHFsketch) GetSketch

func (KHFsketch *KHFsketch) GetSketch() []uint64

GetSketch is a method to return the sketch held by a MinHash KHF sketch object

func (*KHFsketch) Merge

func (KHFsketch *KHFsketch) Merge(KHFsketch2 *KHFsketch)

Merge is a method to combine two MinHash objects TODO: this should check for consistency between MinHash objects

func (*KHFsketch) SetMD5

func (KHFsketch *KHFsketch) SetMD5()

SetMD5 is a method to calculate and store the MD5 for the sketch

type KMVsketch

type KMVsketch struct {
	KmerSize   uint     `json:"ksize"`
	Md5sum     string   `json:"md5sum"`
	Sketch     []uint64 `json:"mins"`
	SketchSize uint     `json:"num"`
	// contains filtered or unexported fields
}

KMVsketch is the bottom-k MinHash sketch of a set

func NewKMVsketch

func NewKMVsketch(k, s uint) *KMVsketch

NewKMVsketch is the constructor for a KMVsketch

func (*KMVsketch) AddHash

func (KMVsketch *KMVsketch) AddHash(hv uint64)

AddHash is a method to evaluate a hash value and add any minimums to the sketch

func (*KMVsketch) GetAlgo

func (KMVsketch *KMVsketch) GetAlgo() string

GetAlgo is a method to return the sketching algorithm used

func (*KMVsketch) GetMD5

func (KMVsketch *KMVsketch) GetMD5() string

GetMD5 is a method to return the MD5 currently calculated for the sketch

func (*KMVsketch) GetSimilarity

func (mh1 *KMVsketch) GetSimilarity(mh2 MinHash) (float64, error)

Similarity computes a similarity estimate for two KMV sketches

func (*KMVsketch) GetSketch

func (KMVsketch *KMVsketch) GetSketch() []uint64

GetSketch is a method to set and return the sketch held by a MinHash KMV sketch

func (*KMVsketch) SetMD5

func (KMVsketch *KMVsketch) SetMD5()

SetMD5 is a method to calculate and store the MD5 for the sketch

func (*KMVsketch) SetSketch

func (KMVsketch *KMVsketch) SetSketch()

SetSketch converts the current IntHeap into a []uint64 and sorts it low -> high

type MinHash

type MinHash interface {
	AddHash(uint64)
	GetSketch() []uint64
	GetSimilarity(mh2 MinHash) (float64, error)
}

MinHash is an interface to group the different flavours of MinHash implemented here

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL