Documentation ¶
Overview ¶
Package dsstats calculates statistical metadata for a given dataset
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // StopFreqCountThreshold is the number of unique values past which we will // stop keeping frequencies. This is a simplistic line of defense against // unweildly memory consumption. 200 because more is not really user friendly // output StopFreqCountThreshold = 200 // HistogramCentroidCount is the max number of centroids/bins for our histogram // calculations more bins give better precision at the tradeoff of more CPU // and memory usage. 32 is a pretty decent count for fairly large datasets. HistogramCentroidCount uint = 32 )
Functions ¶
func Calculate ¶
Calculate determines a stats component by reading each entry in the Body of a given dataset. Requires an open BodyFile and well-formed Structure component
func CalculateFromEntryReader ¶
func CalculateFromEntryReader(r dsio.EntryReader) (st *dataset.Stats, err error)
CalculateFromEntryReader consumes an entry reader to generate a Stats component
Types ¶
type Accumulator ¶
type Accumulator struct {
// contains filtered or unexported fields
}
Accumulator wraps a dsio.EntryReader, on each call to read stats will update it's internal statistics Consumers can only assume the return value of Accumulator.Stats is final after a call to Close
func NewAccumulator ¶
func NewAccumulator(st *dataset.Structure) *Accumulator
NewAccumulator wraps an entry reader to create a stat accumulator
func (*Accumulator) Stats ¶
func (r *Accumulator) Stats() []Stat
Stats gets the statistics created by the accumulator
func (*Accumulator) Structure ¶
func (r *Accumulator) Structure() *dataset.Structure
Structure gives the structure being read
func (*Accumulator) WriteEntry ¶
func (r *Accumulator) WriteEntry(ent dsio.Entry) error
WriteEntry adds one row of structured data to accumulated stats
Directories ¶
Path | Synopsis |
---|---|
Package histosketch introduces the histosketch implementation based on https://github.com/aaw/histosketch histogram_sketch is an implementation of the Histogram Sketch data structure described in Ben-Haim and Tom-Tov's "A Streaming Parallel Decision Tree Algorithm" in Journal of Machine Learning Research 11 (http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf).
|
Package histosketch introduces the histosketch implementation based on https://github.com/aaw/histosketch histogram_sketch is an implementation of the Histogram Sketch data structure described in Ben-Haim and Tom-Tov's "A Streaming Parallel Decision Tree Algorithm" in Journal of Machine Learning Research 11 (http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf). |