classifier

package
v0.50.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 4, 2023 License: BSD-3-Clause, BSD-3-Clause Imports: 10 Imported by: 0

Documentation

Overview

Package classifier provides machine learning classifier library, including CART, Random Forest, Cascaded Random Forest, and KNN.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ComputeAccuracies

func ComputeAccuracies(tp, fp, tn, fn []int64) (accuracies []float64)

ComputeAccuracies will compute and return accuracy from array of true-positive, false-positive, true-negative, and false-negative; using formula,

(tp + tn) / (tp + tn + tn + fn)

func ComputeElapsedTimes

func ComputeElapsedTimes(start, end []int64) (elaps []int64)

ComputeElapsedTimes will compute and return elapsed time between `start` and `end` timestamps.

func ComputeFMeasures

func ComputeFMeasures(precisions, recalls []float64) (fmeasures []float64)

ComputeFMeasures given array of precisions and recalls, compute F-measure of each instance and return it.

Types

type CM

type CM struct {
	tabula.Dataset
	// contains filtered or unexported fields
}

CM represent the matrix of classification.

func (*CM) ComputeNumeric

func (cm *CM) ComputeNumeric(vs, actuals, predictions []int64)

ComputeNumeric will calculate confusion matrix using targets and predictions values.

func (*CM) ComputeStrings

func (cm *CM) ComputeStrings(valueSpace, targets, predictions []string)

ComputeStrings will calculate confusion matrix using targets and predictions class values.

func (*CM) FN

func (cm *CM) FN() int

FN return number of false-negative.

func (*CM) FNIndices

func (cm *CM) FNIndices() []int

FNIndices return indices of all false-negative samples.

func (*CM) FP

func (cm *CM) FP() int

FP return number of false-positive in confusion matrix.

func (*CM) FPIndices

func (cm *CM) FPIndices() []int

FPIndices return indices of all false-positive samples.

func (*CM) GetColumnClassError

func (cm *CM) GetColumnClassError() *tabula.Column

GetColumnClassError return the last column which is the column that contain the error of classification.

func (*CM) GetFalseRate

func (cm *CM) GetFalseRate() float64

GetFalseRate return false-positive rate in term of,

false-positive / (false-positive + true negative)

func (*CM) GetTrueRate

func (cm *CM) GetTrueRate() float64

GetTrueRate return true-positive rate in term of

true-positive / (true-positive + false-positive)

func (*CM) GroupIndexPredictions

func (cm *CM) GroupIndexPredictions(sampleIds []int,
	actuals, predictions []int64,
)

GroupIndexPredictions given index of samples, group the samples by their class of prediction. For example,

sampleIds:   [0, 1, 2, 3, 4, 5]
actuals:     [1, 1, 0, 0, 1, 0]
predictions: [1, 0, 1, 0, 1, 1]

This function will group the index by true-positive, false-positive, true-negative, and false-negative, which result in,

	true-positive indices:  [0, 4]
	false-positive indices: [2, 5]
	true-negative indices:  [3]
     false-negative indices: [1]

This function assume that positive value as "1" and negative value as "0".

func (*CM) GroupIndexPredictionsStrings

func (cm *CM) GroupIndexPredictionsStrings(sampleIds []int,
	actuals, predictions []string,
)

GroupIndexPredictionsStrings is an alternative to GroupIndexPredictions which work with string class.

func (*CM) String

func (cm *CM) String() (s string)

String will return the output of confusion matrix in table like format.

func (*CM) TN

func (cm *CM) TN() int

TN return number of true-negative.

func (*CM) TNIndices

func (cm *CM) TNIndices() []int

TNIndices return indices of all true-negative samples.

func (*CM) TP

func (cm *CM) TP() int

TP return number of true-positive in confusion matrix.

func (*CM) TPIndices

func (cm *CM) TPIndices() []int

TPIndices return indices of all true-positive samples.

type Runtime

type Runtime struct {

	// OOBStatsFile is the file where OOB statistic will be written.
	OOBStatsFile string `json:"OOBStatsFile"`

	// PerfFile is the file where statistic of performance will be written.
	PerfFile string `json:"PerfFile"`

	// StatFile is the file where statistic of classifying samples will be
	// written.
	StatFile string `json:"StatFile"`

	// RunOOB if its true the OOB will be computed, default is false.
	RunOOB bool `json:"RunOOB"`
	// contains filtered or unexported fields
}

Runtime define a generic type which provide common fields that can be embedded by the real classifier (e.g. RandomForest).

func (*Runtime) AddOOBCM

func (rt *Runtime) AddOOBCM(cm *CM)

AddOOBCM will append new confusion matrix.

func (*Runtime) AddStat

func (rt *Runtime) AddStat(stat *Stat)

AddStat will append new classifier statistic data.

func (*Runtime) CloseOOBStatsFile

func (rt *Runtime) CloseOOBStatsFile() (e error)

CloseOOBStatsFile will close statistics file for writing.

func (*Runtime) ComputeCM

func (rt *Runtime) ComputeCM(sampleIds []int,
	vs, actuals, predicts []string,
) (
	cm *CM,
)

ComputeCM will compute confusion matrix of sample using value space, actual and prediction values.

func (*Runtime) ComputeStatFromCM

func (rt *Runtime) ComputeStatFromCM(stat *Stat, cm *CM)

ComputeStatFromCM will compute statistic using confusion matrix.

func (*Runtime) ComputeStatTotal

func (rt *Runtime) ComputeStatTotal(stat *Stat)

ComputeStatTotal compute total statistic.

func (*Runtime) Finalize

func (rt *Runtime) Finalize() (e error)

Finalize finish the runtime, compute total statistic, write it to file, and close the file.

func (*Runtime) Initialize

func (rt *Runtime) Initialize() error

Initialize will start the runtime for processing by saving start time and opening stats file.

func (*Runtime) OOBStats

func (rt *Runtime) OOBStats() *Stats

OOBStats return all statistic objects.

func (*Runtime) OpenOOBStatsFile

func (rt *Runtime) OpenOOBStatsFile() error

OpenOOBStatsFile will open statistic file for output.

func (*Runtime) Performance

func (rt *Runtime) Performance(samples tabula.ClasetInterface,
	predicts []string, probs []float64,
) (
	perfs Stats,
)

Performance given an actuals class label and their probabilities, compute the performance statistic of classifier.

Algorithm, (1) Sort the probabilities in descending order. (2) Sort the actuals and predicts using sorted index from probs (3) Compute tpr, fpr, precision (4) Write performance to file.

func (*Runtime) PrintOobStat

func (rt *Runtime) PrintOobStat(stat *Stat, cm *CM)

PrintOobStat will print the out-of-bag statistic to standard output.

func (*Runtime) PrintStat

func (rt *Runtime) PrintStat(stat *Stat)

PrintStat will print statistic value to standard output.

func (*Runtime) PrintStatTotal

func (rt *Runtime) PrintStatTotal(st *Stat)

PrintStatTotal will print total statistic to standard output.

func (*Runtime) StatTotal

func (rt *Runtime) StatTotal() *Stat

StatTotal return total statistic.

func (*Runtime) WriteOOBStat

func (rt *Runtime) WriteOOBStat(stat *Stat) error

WriteOOBStat will write statistic of process to file.

func (*Runtime) WritePerformance

func (rt *Runtime) WritePerformance() error

WritePerformance will write performance data to file.

type Stat

type Stat struct {
	// ID unique id for this statistic (e.g. number of tree).
	ID int64

	// StartTime contain the start time of classifier in unix timestamp.
	StartTime int64

	// EndTime contain the end time of classifier in unix timestamp.
	EndTime int64

	// ElapsedTime contain actual time, in seconds, between end and start
	// time.
	ElapsedTime int64

	// TP contain true-positive value.
	TP int64

	// FP contain false-positive value.
	FP int64

	// TN contain true-negative value.
	TN int64

	// FN contain false-negative value.
	FN int64

	// OobError contain out-of-bag error.
	OobError float64

	// OobErrorMean contain mean of out-of-bag error.
	OobErrorMean float64

	// TPRate contain true-positive rate (recall): tp/(tp+fn)
	TPRate float64

	// FPRate contain false-positive rate: fp/(fp+tn)
	FPRate float64

	// TNRate contain true-negative rate: tn/(tn+fp)
	TNRate float64

	// Precision contain: tp/(tp+fp)
	Precision float64

	// FMeasure contain value of F-measure or the harmonic mean of
	// precision and recall.
	FMeasure float64

	// Accuracy contain the degree of closeness of measurements of a
	// quantity to that quantity's true value.
	Accuracy float64

	// AUC contain the area under curve.
	AUC float64
}

Stat hold statistic value of classifier, including TP rate, FP rate, precision, and recall.

func (*Stat) End

func (stat *Stat) End()

End will stop the timer and compute the elapsed time.

func (*Stat) Recall

func (stat *Stat) Recall() float64

Recall return value of recall.

func (*Stat) SetAUC

func (stat *Stat) SetAUC(v float64)

SetAUC will set the AUC value.

func (*Stat) SetFPRate

func (stat *Stat) SetFPRate(fp, n int64)

SetFPRate will set FP and FPRate using number of negative `n`.

func (*Stat) SetPrecisionFromRate

func (stat *Stat) SetPrecisionFromRate(p, n int64)

SetPrecisionFromRate will set Precision value using tprate and fprate. `p` and `n` is the number of positive and negative class in samples.

func (*Stat) SetTPRate

func (stat *Stat) SetTPRate(tp, p int64)

SetTPRate will set TP and TPRate using number of positive `p`.

func (*Stat) Start

func (stat *Stat) Start()

Start will start the timer.

func (*Stat) Sum

func (stat *Stat) Sum(other *Stat)

Sum will add statistic from other stat object to current stat, not including the start and end time.

func (*Stat) ToRow

func (stat *Stat) ToRow() (row *tabula.Row)

ToRow will convert the stat to tabula.row in the order of Stat field.

func (*Stat) Write

func (stat *Stat) Write(file string) (e error)

Write will write the content of stat to `file`.

type Stats

type Stats []*Stat

Stats define list of statistic values.

func (*Stats) Accuracies

func (stats *Stats) Accuracies() (accuracies []float64)

Accuracies return all accuracy values.

func (*Stats) Add

func (stats *Stats) Add(stat *Stat)

Add will add other stat object to the slice.

func (*Stats) EndTimes

func (stats *Stats) EndTimes() (times []int64)

EndTimes return all end times in unix timestamp.

func (*Stats) FMeasures

func (stats *Stats) FMeasures() (fmeasures []float64)

FMeasures return all F-measure values.

func (*Stats) FPRates

func (stats *Stats) FPRates() (fprates []float64)

FPRates return all false-positive rate values.

func (*Stats) OobErrorMeans

func (stats *Stats) OobErrorMeans() (oobmeans []float64)

OobErrorMeans return all out-of-bag error mean values.

func (*Stats) Precisions

func (stats *Stats) Precisions() (precs []float64)

Precisions return all precision values.

func (*Stats) Recalls

func (stats *Stats) Recalls() (recalls []float64)

Recalls return all recall values.

func (*Stats) StartTimes

func (stats *Stats) StartTimes() (times []int64)

StartTimes return all start times in unix timestamp.

func (*Stats) TNRates

func (stats *Stats) TNRates() (tnrates []float64)

TNRates will return all true-negative rate values.

func (*Stats) TPRates

func (stats *Stats) TPRates() (tprates []float64)

TPRates return all true-positive rate values.

func (*Stats) Write

func (stats *Stats) Write(file string) (e error)

Write will write all statistic data to `file`.

Directories

Path Synopsis
Package cart implement the Classification and Regression Tree by Breiman, et al.
Package cart implement the Classification and Regression Tree by Breiman, et al.
Package crf implement the cascaded random forest algorithm, proposed by Baumann et.al in their paper:
Package crf implement the cascaded random forest algorithm, proposed by Baumann et.al in their paper:
Package rf implement ensemble of classifiers using random forest algorithm by Breiman and Cutler.
Package rf implement ensemble of classifiers using random forest algorithm by Breiman and Cutler.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL