Documentation
¶
Overview ¶
Package histosketch is a Go implementation of HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift (https://exascale.info/assets/pdf/icdm2017_HistoSketch.pdf) I've made some changes in my implementation compared to the paper: - Instead of providing the number of histogram bins (Dimensions) and the number of countmin hash tables (d), I have decided to use epsilon and delta values to calculate CMS Dimensions. - As I am using HistoSketch to Sketch CMS counters, the Dimensions of the histosketch are determined by the CMS Dimensions
Index ¶
Constants ¶
const DISTRIBUTION_SEED int64 = 1
DISTRIBUTION_SEED is used to generate the distributions for the CWS
const MAX_K uint = 31
MAX_K is the maximum k-mer size currently supported by HULK
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CWS ¶
type CWS struct {
// contains filtered or unexported fields
}
CWS is a struct to hold the consistent weighted sampling information
type HistoSketch ¶
type HistoSketch struct { KmerSize uint `json:"ksize"` // the size of the k-mer used in the histosketch Md5sum string `json:"md5sum"` // md5sum of the sketch Sketch []uint `json:"mins"` // S in paper SketchWeights []float64 `json:"weights"` // A in paper SketchSize uint `json:"num"` // number of minimums in the histosketch Dimensions int32 `json:"num_histogram_bins"` // number of histogram bins ApplyConceptDrift bool `json:"concept_drift"` // if true, uniform scaling will be applied to frequency estimates (in the CMS) and a decay ratio will be applied to sketch elements prior to assessing incoming elements // contains filtered or unexported fields }
HistoSketch is the histosketch data structure
func NewHistoSketch ¶
func NewHistoSketch(kmerSize, histosketchLength uint, numHistogramBins int32, decayRatio float64) (*HistoSketch, error)
NewHistoSketch is the constructor function
func (*HistoSketch) AddElement ¶
func (HistoSketch *HistoSketch) AddElement(bin uint64, value float64) error
AddElement is a method to assess an incoming histogram element and add it to the histosketch if required
func (*HistoSketch) GetAlgo ¶
func (HistoSketch *HistoSketch) GetAlgo() string
GetAlgo is a method to return the sketching algorithm used
func (*HistoSketch) GetMD5 ¶
func (HistoSketch *HistoSketch) GetMD5() string
GetMD5 is a method to return the MD5 currently calculated for the histosketch
func (*HistoSketch) GetSketch ¶
func (HistoSketch *HistoSketch) GetSketch() []uint64
GetSketch is a method to return the current histosketch
func (*HistoSketch) SetMD5 ¶
func (HistoSketch *HistoSketch) SetMD5()
SetMD5 is a method to calculate and store the MD5 for the histosketch