smote

package
v0.55.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 21, 2024 License: BSD-3-Clause, BSD-3-Clause Imports: 8 Imported by: 0

Documentation

Overview

Package smote resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The original dataset must fit entirely in memory. The amount of SMOTE and number of nearest neighbors may be specified. For more information, see

Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16:321-357.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Runtime

type Runtime struct {
	// Synthetics contain output of resampling as synthetic samples.
	Synthetics tabula.Dataset

	// SyntheticFile is a filename where synthetic samples will be written.
	SyntheticFile string `json:"SyntheticFile"`

	// Runtime the K-Nearest-Neighbourhood parameters.
	knn.Runtime

	// PercentOver input for oversampling percentage.
	PercentOver int `json:"PercentOver"`

	// NSynthetic input for number of new synthetic per sample.
	NSynthetic int
}

Runtime for input and output.

func New

func New(percentOver, k, classIndex int) (smoteRun *Runtime)

New create and return new smote runtime.

func (*Runtime) GetSynthetics

func (smote *Runtime) GetSynthetics() tabula.DatasetInterface

GetSynthetics return synthetic samples.

func (*Runtime) Init

func (smote *Runtime) Init()

Init will recheck input and set to default value if its not valid.

func (*Runtime) Resampling

func (smote *Runtime) Resampling(dataset tabula.Rows) (e error)

Resampling will run resampling algorithm using values that has been defined in `Runtime` and return list of synthetic samples.

The `dataset` must be samples of minority class not the whole dataset.

Algorithms,

(0) If oversampling percentage less than 100, then (0.1) replace the input dataset by selecting n random sample from dataset

      without replacement, where n is

	(percentage-oversampling / 100) * number-of-sample

(1) For each `sample` in dataset, (1.1) find k-nearest-neighbors of `sample`, (1.2) generate synthetic sample in neighbors. (2) Write synthetic samples to file, only if `SyntheticFile` is not empty.

func (*Runtime) String

func (smote *Runtime) String() (s string)

func (*Runtime) Write

func (smote *Runtime) Write(file string) error

Write will write synthetic samples to file defined in `file`.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL