Documentation ¶
Overview ¶
Package smote resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The original dataset must fit entirely in memory. The amount of SMOTE and number of nearest neighbors may be specified. For more information, see
Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16:321-357.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Runtime ¶
type Runtime struct { // Synthetics contain output of resampling as synthetic samples. Synthetics tabula.Dataset // SyntheticFile is a filename where synthetic samples will be written. SyntheticFile string `json:"SyntheticFile"` // Runtime the K-Nearest-Neighbourhood parameters. knn.Runtime // PercentOver input for oversampling percentage. PercentOver int `json:"PercentOver"` // NSynthetic input for number of new synthetic per sample. NSynthetic int }
Runtime for input and output.
func (*Runtime) GetSynthetics ¶
func (smote *Runtime) GetSynthetics() tabula.DatasetInterface
GetSynthetics return synthetic samples.
func (*Runtime) Init ¶
func (smote *Runtime) Init()
Init will recheck input and set to default value if its not valid.
func (*Runtime) Resampling ¶
Resampling will run resampling algorithm using values that has been defined in `Runtime` and return list of synthetic samples.
The `dataset` must be samples of minority class not the whole dataset.
Algorithms,
(0) If oversampling percentage less than 100, then (0.1) replace the input dataset by selecting n random sample from dataset
without replacement, where n is (percentage-oversampling / 100) * number-of-sample
(1) For each `sample` in dataset, (1.1) find k-nearest-neighbors of `sample`, (1.2) generate synthetic sample in neighbors. (2) Write synthetic samples to file, only if `SyntheticFile` is not empty.