histosketch

package
v0.0.0-...-cbd5ca3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2023 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package histosketch is a Go implementation of HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift (https://exascale.info/assets/pdf/icdm2017_HistoSketch.pdf) I've made some changes in my implementation compared to the paper: - Instead of providing the number of histogram bins (Dimensions) and the number of countmin hash tables (d), I have decided to use epsilon and delta values to calculate CMS Dimensions. - As I am using HistoSketch to Sketch CMS counters, the Dimensions of the histosketch are determined by the CMS Dimensions

Index

Constants

View Source
const DISTRIBUTION_SEED int64 = 1

DISTRIBUTION_SEED is used to generate the distributions for the CWS

View Source
const MAX_K uint = 31

MAX_K is the maximum k-mer size currently supported by HULK

Variables

This section is empty.

Functions

This section is empty.

Types

type CWS

type CWS struct {
	// contains filtered or unexported fields
}

CWS is a struct to hold the consistent weighted sampling information

type HistoSketch

type HistoSketch struct {
	KmerSize          uint      `json:"ksize"`              // the size of the k-mer used in the histosketch
	Md5sum            string    `json:"md5sum"`             // md5sum of the sketch
	Sketch            []uint    `json:"mins"`               // S in paper
	SketchWeights     []float64 `json:"weights"`            // A in paper
	SketchSize        uint      `json:"num"`                // number of minimums in the histosketch
	Dimensions        int32     `json:"num_histogram_bins"` // number of histogram bins
	ApplyConceptDrift bool      `json:"concept_drift"`      // if true, uniform scaling will be applied to frequency estimates (in the CMS) and a decay ratio will be applied to sketch elements prior to assessing incoming elements
	// contains filtered or unexported fields
}

HistoSketch is the histosketch data structure

func NewHistoSketch

func NewHistoSketch(kmerSize, histosketchLength uint, numHistogramBins int32, decayRatio float64) (*HistoSketch, error)

NewHistoSketch is the constructor function

func (*HistoSketch) AddElement

func (HistoSketch *HistoSketch) AddElement(bin uint64, value float64) error

AddElement is a method to assess an incoming histogram element and add it to the histosketch if required

func (*HistoSketch) GetAlgo

func (HistoSketch *HistoSketch) GetAlgo() string

GetAlgo is a method to return the sketching algorithm used

func (*HistoSketch) GetMD5

func (HistoSketch *HistoSketch) GetMD5() string

GetMD5 is a method to return the MD5 currently calculated for the histosketch

func (*HistoSketch) GetSketch

func (HistoSketch *HistoSketch) GetSketch() []uint64

GetSketch is a method to return the current histosketch

func (*HistoSketch) SetMD5

func (HistoSketch *HistoSketch) SetMD5()

SetMD5 is a method to calculate and store the MD5 for the histosketch

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL