datautils

package module

v0.0.0-...-e1a3218 Latest Latest Go to latest Published: Feb 4, 2021 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/james-bowman/datautils

Links

Open Source Insights

Documentation ¶

Index ¶

func EmphasisedRelevancy(r float64) float64
func PlotHeatmap(corr mat.Matrix, xlabels []string, ylabels []string) (p *plot.Plot, err error)
func TraditionalRelevancy(r float64) float64
type ConfusionMatrix
- func NewConfusionMatrix(predictions []float64, labels []float64, threshold float64) ConfusionMatrix
type PrecisionRecallCurve
- func NewPrecisionRecallCurve(predictions, labels []float64) PrecisionRecallCurve
type RankingEvaluation
- func NewRankingEvaluation(predictions, labels []float64) RankingEvaluation
type RelevancyFunction

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func EmphasisedRelevancy ¶

func EmphasisedRelevancy(r float64) float64

EmphasisedRelevancy is an alternative formulation of the relevancy function for calculating discounted cumulative gain that more strongly emphasises the degree of relevancy r.

func PlotHeatmap ¶

func PlotHeatmap(corr mat.Matrix, xlabels []string, ylabels []string) (p *plot.Plot, err error)

func TraditionalRelevancy ¶

func TraditionalRelevancy(r float64) float64

TraditionalRelevancy is the traditional formulation of the relevancy function for calculating discounted cumulative gain. It simply directly uses the specied degree of relevancy r.

Types ¶

type ConfusionMatrix ¶

type ConfusionMatrix struct {
	Observations, Pos, Neg, TruePos, TrueNeg, FalsePos, FalseNeg int
}

func NewConfusionMatrix ¶

func NewConfusionMatrix(predictions []float64, labels []float64, threshold float64) ConfusionMatrix

func (ConfusionMatrix) Accuracy ¶

func (c ConfusionMatrix) Accuracy() float64

func (ConfusionMatrix) F1 ¶

func (c ConfusionMatrix) F1() float64

func (ConfusionMatrix) Precision ¶

func (c ConfusionMatrix) Precision() float64

func (ConfusionMatrix) Recall ¶

func (c ConfusionMatrix) Recall() float64

func (ConfusionMatrix) String ¶

func (c ConfusionMatrix) String() string

type PrecisionRecallCurve ¶

type PrecisionRecallCurve struct {
	// Precision is a slice containing the ranked precision values at K for the predictions until all positive/
	// relevant items were found according to corresponding the ground truth labels (recall==1)
	Precision []float64

	// Recall is a slice containing the ranked recall values at K for the predictions until all positive/
	// relevant items were found according to corresponding the ground truth labels (recall==1)
	Recall []float64

	// Thresholds is a slice containing the ranked (sorted) predictions (probability/similarity scores) until
	// all positive/relevant items were found according to corresponding the ground truth labels (recall==1)
	Thresholds []float64
	// contains filtered or unexported fields
}

PrecisionRecallCurve represents a precision recall curve for visualising and measuring the performance of a classification or information retrieval model. It can be used to evaluate how well the model predictions can be ranked compared to a perfect ranking according to the ground truth labels. This is usefull when evaluating ranking based on relevancy for information retrieval or raw classification performance based on predicted probability of class membership e.g. logistic regression predictions without using a threshold to determine the class for the predicted probability. It is important to note that Precision[0] and Recall[0] indicate the precision and recall @ 0 and so will always be 1 and 0 respectively.

func NewPrecisionRecallCurve ¶

func NewPrecisionRecallCurve(predictions, labels []float64) PrecisionRecallCurve

NewPrecisionRecallCurve creates a new precision recall curve. The precision recall curve visualises how well the model's predictions (or similarity scores for information retrieval) can be ranked compared to a perfect ranking according to the ground truth labels. Both the supplied predictions and labels slices can be in any order providing they are identical lengths and their order matches e.g. predictions[5] corresponds to the ground truth labels[5]. As Precision Recall curves and average precision (summarising the curve as a single metric/area under the curve) represent a binary class/relevance measure we assume that any label value greater than 0 represents a positive/relative observation (and 0 label values represent a negative/non-relevant observation).

func (PrecisionRecallCurve) AverageInterpolatedPrecision ¶

func (c PrecisionRecallCurve) AverageInterpolatedPrecision() float64

AverageInterpolatedPrecision calculates the average interpolated precision based on the predictions and labels the curve was constructed with. Average Interpolated Precision represents the area under the curve of the precision recall curve using interpolated precision for 11 fixed recall values {0.0, 0.1, 0.2, ... 1.0}.

func (PrecisionRecallCurve) AveragePrecision ¶

func (c PrecisionRecallCurve) AveragePrecision() float64

AveragePrecision calculates the average precision based on the predictions and labels the curve was constructed with. Average Precision represents the area under the curve of the precision recall curve and is a method for summarising the curve in a single metric.

func (PrecisionRecallCurve) InterpolatedPrecisionAt ¶

func (c PrecisionRecallCurve) InterpolatedPrecisionAt(r float64) float64

InterpolatedPrecisionAt calculates an interpolated Precision@r. This can be used to calculate the precision for a specific recall value that does not necessarily occur explicitly in the ranking. It is calculated by taking the maximum precision value over all recalls greater than r.

func (PrecisionRecallCurve) Plot ¶

func (c PrecisionRecallCurve) Plot() *plot.Plot

Plot renders the entire precision recall curve as a plot for visualisation.

func (PrecisionRecallCurve) PrecisionAt ¶

func (c PrecisionRecallCurve) PrecisionAt(k int) float64

PrecisionAt calculates the Precision@k. This represents the precision at a certain cut-off, k i.e. if a search returns 10 (k=10) results what is the proportion of those 10 results that are relevant or if we are only interested in the relevancy of the top ranked item (k=1) is that item relevant or not.

func (PrecisionRecallCurve) RPrecision ¶

func (c PrecisionRecallCurve) RPrecision() float64

RPrecision returns the R-Precision. The total number of relevant documents, R, is used as the cutoff for calculation, and this varies from query to query. It counts the number of results ranked above the cutoff that are relevant, r, and turns that into a relevancy fraction: r/R.

type RankingEvaluation ¶

type RankingEvaluation struct {
	// Ground truth relevancy values in original ordering
	Relevancies []float64

	// ranked indexes of relevancy values, ranked according to predicted relevancy/probabilty values
	PredictedRankInd []int

	// ranked indexes of relevancy values, ranked according to ground truth relevancy values (a perfect ranking)
	PerfectRankInd []int
}

RankingEvaluation type for evaluating rankings for information retrieval and classification supporting calculation of [normalised] discounted cumulative gain

func NewRankingEvaluation ¶

func NewRankingEvaluation(predictions, labels []float64) RankingEvaluation

NewRankingEvaluation creates a new RankingEvaluation type from the specified predicted relevancies (predictions) and ground truth relevancy values (labels). The ordering of both slices must correspond and the lengths must match.

func (RankingEvaluation) CumulativeGain ¶

func (r RankingEvaluation) CumulativeGain(k int) float64

CumulativeGain calculates the cumulative gain for the ranking. This is the cumulative gain or sum of relevancy values at each rank up to the kth ranked item. Where k is the cut-off (specify len(Relevancies) for ALL items/no cut-off).

func (RankingEvaluation) DiscountedCumulativeGain ¶

func (r RankingEvaluation) DiscountedCumulativeGain(k int, rel RelevancyFunction) float64

DiscountedCumulativeGain calculates the discounted cumulative gain for the ranking. This is the cumulative gain or sum of relevancy values at each rank up to the kth ranked item with each relevancy value being discounted according to rank so that relevancy values at lower ranks are more heavily discounted and therefore contribute less to the sum. Where k is the cut-off (specify len(Relevancies) for ALL items/no cut-off) and rel is the relevancy function to use. See TraditionalRelevancy and EmphasisedRelevancy for two popular formulations of the relevancy function - either of which may be specified for this parameter.

func (RankingEvaluation) NormalisedDiscountedCumulativeGain ¶

func (r RankingEvaluation) NormalisedDiscountedCumulativeGain(k int, rel RelevancyFunction) float64

NormalisedDiscountedCumulativeGain calculates the normalised discounted cumulative gain for the ranking. This is the ratio of the discounted cumulative gain for the given ranking compared to the discounted cumulative for a perfect ranking of the same items. Where k is the cut-off (specify len(Relevancies) for ALL items/no cut-off) and rel is the relevancy function to use. See TraditionalRelevancy and EmphasisedRelevancy for two popular formulations of the relevancy function - either of which may be specified for this parameter.

type RelevancyFunction ¶

type RelevancyFunction func(float64) float64

RelevancyFunction supports specification/weighting of relevancy values for calculating discounted cumulative gain

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL