rf

package

v0.57.0 Latest Latest Go to latest Published: Sep 3, 2024 License: BSD-3-Clause, BSD-3-Clause Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

git.sr.ht/~shulhan/pakakeh.go

Links

Open Source Insights

Documentation ¶

Overview ¶

Package rf implement ensemble of classifiers using random forest algorithm by Breiman and Cutler.

Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.

The implementation is based on various sources and using author experience.

Constants ¶

View Source

const (

	// DefNumTree default number of tree.
	DefNumTree = 100

	// DefPercentBoot default percentage of sample that will be used for
	// bootstraping a tree.
	DefPercentBoot = 66

	// DefOOBStatsFile default statistic file output.
	DefOOBStatsFile = "rf.oob.stat"

	// DefPerfFile default performance file output.
	DefPerfFile = "rf.perf"

	// DefStatFile default statistic file.
	DefStatFile = "rf.stat"
)

Variables ¶

View Source

var (
	// ErrNoInput will tell you when no input is given.
	ErrNoInput = errors.New("rf: input samples is empty")
)

Functions ¶

This section is empty.

Types ¶

type Runtime ¶

type Runtime struct {

	// Runtime embed common fields for classifier.
	classifier.Runtime

	// NTree number of tree in forest.
	NTree int `json:"NTree"`

	// NRandomFeature number of feature randomly selected for each tree.
	NRandomFeature int `json:"NRandomFeature"`

	// PercentBoot percentage of sample for bootstraping.
	PercentBoot int `json:"PercentBoot"`
	// contains filtered or unexported fields
}

Runtime contains input and output configuration when generating random forest.

func (*Runtime) AddBagIndex ¶

func (forest *Runtime) AddBagIndex(bagIndex []int)

AddBagIndex add bagging index for book keeping.

func (*Runtime) AddCartTree ¶

func (forest *Runtime) AddCartTree(tree cart.Runtime)

AddCartTree add tree to forest

func (*Runtime) Build ¶

func (forest *Runtime) Build(samples tabula.ClasetInterface) (e error)

Build the forest using samples dataset.

Algorithm,

(0) Recheck input value: number of tree, percentage bootstrap, etc; and

Open statistic file output.

(1) For 0 to NTree, (1.1) Create new tree, repeat until all trees has been build. (2) Compute and write total statistic.

func (*Runtime) ClassifySet ¶

func (forest *Runtime) ClassifySet(samples tabula.ClasetInterface,
	sampleListID []int,
) (
	predicts []string, cm *classifier.CM, probs []float64,
)

ClassifySet given a samples predict their class by running each sample in forest, and return their class prediction with confusion matrix. `samples` is the sample that will be predicted, `sampleListID` is the index of samples. If `sampleListID` is not nil, then sample index will be checked in each tree, if the sample is used for training, their vote is not counted.

Algorithm,

(0) Get value space (possible class values in dataset) (1) For each row in test-set, (1.1) collect votes in all trees, (1.2) select majority class vote, and (1.3) compute and save the actual class probabilities. (2) Compute confusion matrix from predictions. (3) Compute stat from confusion matrix. (4) Write the stat to file only if sampleListID is empty, which mean its run not from OOB set.

func (*Runtime) GrowTree ¶

func (forest *Runtime) GrowTree(samples tabula.ClasetInterface) (
	cm *classifier.CM, stat *classifier.Stat, e error,
)

GrowTree build a new tree in forest, return OOB error value or error if tree can not grow.

Algorithm,

(1) Select random samples with replacement, also with OOB. (2) Build tree using CART, without pruning. (3) Add tree to forest. (4) Save index of random samples for calculating error rate later. (5) Run OOB on forest. (6) Calculate OOB error rate and statistic values.

func (*Runtime) Initialize ¶

func (forest *Runtime) Initialize(samples tabula.ClasetInterface) error

Initialize will check forest inputs and set it to default values if invalid.

It will also calculate number of random samples for each tree using,

number-of-sample * percentage-of-bootstrap

func (*Runtime) Trees ¶

func (forest *Runtime) Trees() []cart.Runtime

Trees return all tree in forest.

func (*Runtime) Votes ¶

func (forest *Runtime) Votes(sample *tabula.Row, sampleIdx int) (
	votes []string,
)

Votes will return votes, or classes, in each tree based on sample. If checkIdx is true then the `sampleIdx` will be checked in if it has been used when training the tree, if its exist then the sample will be skipped.

(1) If row is used to build the tree then skip it, (2) classify row in tree, (3) save tree class value.

Source Files ¶

View all Source files

rf.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL