dsstats

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2021 License: MIT Imports: 8 Imported by: 2

Documentation

Overview

Package dsstats calculates statistical metadata for a given dataset

Index

Constants

This section is empty.

Variables

View Source
var (
	// StopFreqCountThreshold is the number of unique values past which we will
	// stop keeping frequencies. This is a simplistic line of defense against
	// unweildly memory consumption. 200 because more is not really user friendly
	// output
	StopFreqCountThreshold = 200
	// HistogramCentroidCount is the max number of centroids/bins for our histogram
	// calculations more bins give better precision at the tradeoff of more CPU
	// and memory usage. 32 is a pretty decent count for fairly large datasets.
	HistogramCentroidCount uint = 32
)

Functions

func Calculate

func Calculate(ds *dataset.Dataset) (st *dataset.Stats, err error)

Calculate determines a stats component by reading each entry in the Body of a given dataset. Requires an open BodyFile and well-formed Structure component

func CalculateFromEntryReader

func CalculateFromEntryReader(r dsio.EntryReader) (st *dataset.Stats, err error)

CalculateFromEntryReader consumes an entry reader to generate a Stats component

func ToMap

func ToMap(s Statser) []map[string]interface{}

ToMap converts stats to a Plain Old Data object

Types

type Accumulator

type Accumulator struct {
	// contains filtered or unexported fields
}

Accumulator wraps a dsio.EntryReader, on each call to read stats will update it's internal statistics Consumers can only assume the return value of Accumulator.Stats is final after a call to Close

func NewAccumulator

func NewAccumulator(st *dataset.Structure) *Accumulator

NewAccumulator wraps an entry reader to create a stat accumulator

func (*Accumulator) Close

func (r *Accumulator) Close() error

Close finalizes the Reader

func (*Accumulator) Stats

func (r *Accumulator) Stats() []Stat

Stats gets the statistics created by the accumulator

func (*Accumulator) Structure

func (r *Accumulator) Structure() *dataset.Structure

Structure gives the structure being read

func (*Accumulator) WriteEntry

func (r *Accumulator) WriteEntry(ent dsio.Entry) error

WriteEntry adds one row of structured data to accumulated stats

type Stat

type Stat interface {
	// Type returns a string identifier for the kind of statistic being reported
	Type() string
	// Map reports statistical details as a map, map must not return nil
	Map() map[string]interface{}
}

Stat describes common features of all statistical types

type Statser

type Statser interface {
	Stats() []Stat
}

Statser produces a slice of Stat objects

Directories

Path Synopsis
Package histosketch introduces the histosketch implementation based on https://github.com/aaw/histosketch histogram_sketch is an implementation of the Histogram Sketch data structure described in Ben-Haim and Tom-Tov's "A Streaming Parallel Decision Tree Algorithm" in Journal of Machine Learning Research 11 (http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf).
Package histosketch introduces the histosketch implementation based on https://github.com/aaw/histosketch histogram_sketch is an implementation of the Histogram Sketch data structure described in Ben-Haim and Tom-Tov's "A Streaming Parallel Decision Tree Algorithm" in Journal of Machine Learning Research 11 (http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL