Documentation ¶
Overview ¶
Package rf implement ensemble of classifiers using random forest algorithm by Breiman and Cutler.
Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.
The implementation is based on various sources and using author experience.
Index ¶
- Constants
- Variables
- type Runtime
- func (forest *Runtime) AddBagIndex(bagIndex []int)
- func (forest *Runtime) AddCartTree(tree cart.Runtime)
- func (forest *Runtime) Build(samples tabula.ClasetInterface) (e error)
- func (forest *Runtime) ClassifySet(samples tabula.ClasetInterface, sampleListID []int) (predicts []string, cm *classifier.CM, probs []float64)
- func (forest *Runtime) GrowTree(samples tabula.ClasetInterface) (cm *classifier.CM, stat *classifier.Stat, e error)
- func (forest *Runtime) Initialize(samples tabula.ClasetInterface) error
- func (forest *Runtime) Trees() []cart.Runtime
- func (forest *Runtime) Votes(sample *tabula.Row, sampleIdx int) (votes []string)
Constants ¶
const ( // DefNumTree default number of tree. DefNumTree = 100 // DefPercentBoot default percentage of sample that will be used for // bootstraping a tree. DefPercentBoot = 66 // DefOOBStatsFile default statistic file output. DefOOBStatsFile = "rf.oob.stat" // DefPerfFile default performance file output. DefPerfFile = "rf.perf" // DefStatFile default statistic file. DefStatFile = "rf.stat" )
Variables ¶
var ( // ErrNoInput will tell you when no input is given. ErrNoInput = errors.New("rf: input samples is empty") )
Functions ¶
This section is empty.
Types ¶
type Runtime ¶
type Runtime struct { // Runtime embed common fields for classifier. classifier.Runtime // NTree number of tree in forest. NTree int `json:"NTree"` // NRandomFeature number of feature randomly selected for each tree. NRandomFeature int `json:"NRandomFeature"` // PercentBoot percentage of sample for bootstraping. PercentBoot int `json:"PercentBoot"` // contains filtered or unexported fields }
Runtime contains input and output configuration when generating random forest.
func (*Runtime) AddBagIndex ¶
AddBagIndex add bagging index for book keeping.
func (*Runtime) AddCartTree ¶
AddCartTree add tree to forest
func (*Runtime) Build ¶
func (forest *Runtime) Build(samples tabula.ClasetInterface) (e error)
Build the forest using samples dataset.
Algorithm,
(0) Recheck input value: number of tree, percentage bootstrap, etc; and
Open statistic file output.
(1) For 0 to NTree, (1.1) Create new tree, repeat until all trees has been build. (2) Compute and write total statistic.
func (*Runtime) ClassifySet ¶
func (forest *Runtime) ClassifySet(samples tabula.ClasetInterface, sampleListID []int, ) ( predicts []string, cm *classifier.CM, probs []float64, )
ClassifySet given a samples predict their class by running each sample in forest, and return their class prediction with confusion matrix. `samples` is the sample that will be predicted, `sampleListID` is the index of samples. If `sampleListID` is not nil, then sample index will be checked in each tree, if the sample is used for training, their vote is not counted.
Algorithm,
(0) Get value space (possible class values in dataset) (1) For each row in test-set, (1.1) collect votes in all trees, (1.2) select majority class vote, and (1.3) compute and save the actual class probabilities. (2) Compute confusion matrix from predictions. (3) Compute stat from confusion matrix. (4) Write the stat to file only if sampleListID is empty, which mean its run not from OOB set.
func (*Runtime) GrowTree ¶
func (forest *Runtime) GrowTree(samples tabula.ClasetInterface) ( cm *classifier.CM, stat *classifier.Stat, e error, )
GrowTree build a new tree in forest, return OOB error value or error if tree can not grow.
Algorithm,
(1) Select random samples with replacement, also with OOB. (2) Build tree using CART, without pruning. (3) Add tree to forest. (4) Save index of random samples for calculating error rate later. (5) Run OOB on forest. (6) Calculate OOB error rate and statistic values.
func (*Runtime) Initialize ¶
func (forest *Runtime) Initialize(samples tabula.ClasetInterface) error
Initialize will check forest inputs and set it to default values if invalid.
It will also calculate number of random samples for each tree using,
number-of-sample * percentage-of-bootstrap
func (*Runtime) Votes ¶
Votes will return votes, or classes, in each tree based on sample. If checkIdx is true then the `sampleIdx` will be checked in if it has been used when training the tree, if its exist then the sample will be skipped.
(1) If row is used to build the tree then skip it, (2) classify row in tree, (3) save tree class value.