smote

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2016 License: BSD-3-Clause Imports: 6 Imported by: 0

Documentation

Overview

Package smote resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The original dataset must fit entirely in memory. The amount of SMOTE and number of nearest neighbors may be specified. For more information, see

Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling
Technique. Journal of Artificial Intelligence Research. 16:321-357.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Runtime added in v0.2.0

type Runtime struct {
	// Runtime the K-Nearest-Neighbourhood parameters.
	knn.Runtime
	// PercentOver input for oversampling percentage.
	PercentOver int `json:"PercentOver"`
	// SyntheticFile is a filename where synthetic samples will be written.
	SyntheticFile string `json:"SyntheticFile"`
	// NSynthetic input for number of new synthetic per sample.
	NSynthetic int
	// Synthetics contain output of resampling as synthetic samples.
	Synthetics tabula.Dataset
}

Runtime for input and output.

func New added in v0.2.0

func New(percentOver, k, classIndex int) (smoteRun *Runtime)

New create and return new smote runtime.

func (*Runtime) GetSynthetics added in v0.2.0

func (smote *Runtime) GetSynthetics() tabula.DatasetInterface

GetSynthetics return synthetic samples.

func (*Runtime) Init added in v0.2.0

func (smote *Runtime) Init()

Init will recheck input and set to default value if its not valid.

func (*Runtime) Resampling added in v0.2.0

func (smote *Runtime) Resampling(dataset tabula.Rows) (e error)

Resampling will run resampling algorithm using values that has been defined in `Runtime` and return list of synthetic samples.

The `dataset` must be samples of minority class not the whole dataset.

Algorithms,

(0) If oversampling percentage less than 100, then (0.1) replace the input dataset by selecting n random sample from dataset

      without replacement, where n is

	(percentage-oversampling / 100) * number-of-sample

(1) For each `sample` in dataset, (1.1) find k-nearest-neighbors of `sample`, (1.2) generate synthetic sample in neighbors. (2) Write synthetic samples to file, only if `SyntheticFile` is not empty.

func (*Runtime) String added in v0.2.0

func (smote *Runtime) String() (s string)

func (*Runtime) Write added in v0.2.0

func (smote *Runtime) Write(file string) error

Write will write synthetic samples to file defined in `file`.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL