Documentation ¶
Overview ¶
Package smote resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The original dataset must fit entirely in memory. The amount of SMOTE and number of nearest neighbors may be specified. For more information, see
Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16:321-357.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Runtime ¶ added in v0.2.0
type Runtime struct { // Runtime the K-Nearest-Neighbourhood parameters. knn.Runtime // PercentOver input for oversampling percentage. PercentOver int `json:"PercentOver"` // SyntheticFile is a filename where synthetic samples will be written. SyntheticFile string `json:"SyntheticFile"` // NSynthetic input for number of new synthetic per sample. NSynthetic int // Synthetics contain output of resampling as synthetic samples. Synthetics tabula.Dataset }
Runtime for input and output.
func (*Runtime) GetSynthetics ¶ added in v0.2.0
func (smote *Runtime) GetSynthetics() tabula.DatasetInterface
GetSynthetics return synthetic samples.
func (*Runtime) Init ¶ added in v0.2.0
func (smote *Runtime) Init()
Init will recheck input and set to default value if its not valid.
func (*Runtime) Resampling ¶ added in v0.2.0
Resampling will run resampling algorithm using values that has been defined in `Runtime` and return list of synthetic samples.
The `dataset` must be samples of minority class not the whole dataset.
Algorithms,
(0) If oversampling percentage less than 100, then (0.1) replace the input dataset by selecting n random sample from dataset
without replacement, where n is (percentage-oversampling / 100) * number-of-sample
(1) For each `sample` in dataset, (1.1) find k-nearest-neighbors of `sample`, (1.2) generate synthetic sample in neighbors. (2) Write synthetic samples to file, only if `SyntheticFile` is not empty.