This tool generates CSV files for various statistical analyses of the t-digest's behavior. They are similar to the CSVs generated by Dunning's own tests here. To use:
go build .
./analysis [<rounds> [<samples> [<seed>]]]
This will populate a t-digest with samples
samples in it, drawn from a uniform distribution seeded by seed
, and then dump statistical information about the t-digest to various CSV files. It will repeat this process rounds
times for the uniform distribution, and then another rounds
times for a normal distribution, and then another rounds
times for an exponential distribution. The default values will run 10 rounds with 100,000 samples each, using a seed from the current time. This behavior is equivalent to Dunning's original tests.
An R script has been included to draw various graphs from the data. It depends on data.table
and ggplot2
. Note that, depending on the size of the ata, it can take upwards of 10 minutes for this script to finish.