regression

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2022 License: MIT Imports: 2 Imported by: 1

README

regression

GitHub Workflow Status Go Report Card Go Version GoDoc

regression is a simple, written from scratch Go library for basic variants of two the most popular models from the Generalized Linear Models (GLM) family.

regression/linear provides implementation of the linear regression model.

regression/logistic provides implementation of the logistic regression model.

Install

go get github.com/erni27/regression

Why?

Does the world need another not fancy machine learning library? Actually, no. The main puropose of regression library is learning. Machine learning and AI algorithms remains mystified and magic. So here it is, a simple, written from scratch implementation of one of the most popular algorithms solving regression and classification problem. It doesn't throw the responsibility for underyling math (like matrix calculus and iterative optimisation) to the external packages. Everything is embedded in this repository.

Linear regression

regression/linear package provides two ways of computing the linear regression coefficients.

The first one uses an itertaive approach - gradient descent algorithm. Options from regression/options package allows to configure algorithm parameters. What can be configured are listed below:

  • Learning rate - determines the size of each step taken by gradient descent
  • Gradient descent variant - determines the gradient descent variant (batch or stochastic).
  • Convergence type - determines the convergence type (iterative or automatic). An iterative convergence means that gradient descent will run excatly i. On the other hand, an automatic convergence declares convergence if a cost function decreseas less than t in one iteration.
// Creates regression options with:
// 1) Learning rate equals 1e-8.
// 2) Batch gradient descent variant.
// 3) Iterative convergance with number of iterations equal 1000.
opt := options.WithIterativeConvergence(1e-8, options.Batch, 1000)
// Initialize linear regression with gradient descent (numerical approach).
r := linear.WithGradientDescent(opt)
// Create design matrix as a 2D slice.
x := [][]float64{
    {2104, 3},
    {1600, 3},
    {2400, 3},
    {1416, 2},
    {3000, 4},
    {1985, 4},
    {1534, 3},
    {1427, 3},
    {1380, 3},
    {1494, 3},
}
// Create target vector as a slice.
y := []float64{399900, 329900, 369000, 232000, 539900, 299900, 314900, 198999, 212000, 242500}
ctx := context.Background()
// Run linear regression.
m, err := r.Run(ctx, regression.TrainingSet{X: x, Y: y})
if err != nil {
    log.Fatal(err)
}
fmt.Println(m)
acc := m.Accuracy()
fmt.Printf("Accuracy: %f.\n", acc)
coeffs := m.Coefficients()
fmt.Printf("Coefficients: %v.\n", coeffs)
// Do a predicition for a new input feature vector.
in := []float64{2550, 4}
p, err := m.Predict(in)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("For vector %v, predicted value equals %f.\n", in, p)

The preceding code:

  • Initializes regression options.
  • Initializes linear regression with gradient descent and configures it through created options.
  • Prepare TrainingSet.
  • Run linear regression.
  • Predict a value for a new vector.

An automatic convergance can be configured through the WithAutomaticConvergence factory method from regression/options package.

// Initalize regression options with
// 1) Learning rate equals 1e-8.
// 2) Stochastic gradient descent variant.
// 3) Automatic convergance with threshold equals 1e-6.
opt := options.WithAutomaticConvergence(1e-8, options.Stochastic, 1e-6)

Convergance through the automatic convergance test is rarely used in practice since it's really hard to set an appropriate threshold.

regression/linear package offers a second way of computing linear regression coefficients - by solving the normal equation (analytical approach). Basically, to minimize the cost function, it sets its derivatives to zero.

// Initialize linear regression with normal equation (analytical approach).
r := linear.WithNormalEquation()
// Create design matrix as a 2D slice.
x := [][]float64{
    {2104, 3},
    {1600, 3},
    {2400, 3},
    {1416, 2},
    {3000, 4},
    {1985, 4},
    {1534, 3},
    {1427, 3},
    {1380, 3},
    {1494, 3},
}
// Create target vector as a slice.
y := []float64{399900, 329900, 369000, 232000, 539900, 299900, 314900, 198999, 212000, 242500}
ctx := context.Background()
// Run linear regression.
m, err := r.Run(ctx, regression.TrainingSet{X: x, Y: y})
if err != nil {
    log.Fatal(err)
}
fmt.Println(m)
acc := m.Accuracy()
fmt.Printf("Accuracy: %f.\n", acc)
coeffs := m.Coefficients()
fmt.Printf("Coefficients: %v.\n", coeffs)
// Do a predicition for a new input feature vector.
in := []float64{2550, 4}
p, err := m.Predict(in)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("For vector %v, predicted value equals %f.\n", in, p)

The preceding code:

  • Initializes linear regression with normal equation.
  • Prepare TrainingSet.
  • Run linear regression.
  • Predict a value for a new vector.

With the normal equation, there is no need to choose alpha but it can be slow for a very large number of features. It's caused by computing the matrix inversion under the hood.

Logistic regression

regression/logistic, unlike regression/linear, provides only an iterative approach for computing the logistic regression coefficients. It uses exactly the same algorithm like the linear regression - gradient descent. So everything regarding gradient descent from the previous section applies here either.

// Creates regression options with:
// 1) Learning rate equals 1e-2.
// 2) Batch gradient descent variant.
// 3) Iterative convergance with number of iterations equal 100.
opt := options.WithIterativeConvergence(1e-2, options.Batch, 100)
// Initialize logistic regression with normal equation (analytical approach).
r := logistic.WithGradientDescent(opt)
// Create design matrix as a 2D slice.
x := [][]float64{
    {34, 78},
    {30, 43},
    {35, 72},
    {60, 86},
    {79, 75},
    {45, 56},
    {61, 96},
    {75, 46},
    {76, 87},
    {84, 43},
}
// Create target vector as a slice.
y := []float64{0, 0, 0, 1, 1, 0, 1, 1, 1, 1}
ctx := context.Background()
// Run logistic regression.
m, err := r.Run(ctx, regression.TrainingSet{X: x, Y: y})
if err != nil {
    log.Fatal(err)
}
fmt.Println(m)
acc := m.Accuracy()
fmt.Printf("Accuracy: %f.\n", acc)
coeffs := m.Coefficients()
fmt.Printf("Coefficients: %v.\n", coeffs)
// Do a predicition for a new input feature vector.
in := []float64{52, 88}
p, err := m.Predict(in)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("For vector %v, predicted value equals %d.\n", in, p)

The preceding code:

  • Initializes regression options.
  • Initializes logistic regression with gradient descent and configures it through created options.
  • Prepare TrainingSet.
  • Run logistic regression.
  • Predict a value for a new vector.

Be careful when using an automatic convergance with logistic regression. Without a feature scaling it often cannot converge and computes forever (can be stopped via Context).

Feature scaling

Gradient descent can be much faster when a design matrix consist of features approximately within the same range.

regression/scaling package contains implementation of two feature scaling techniques:

  • Mean normalization
  • Standarization

The following code presents a feature scaling with a normalization as a feature scaling technique.

x := [][]float64{
    {2104, 3},
    {1600, 3},
    {2400, 3},
    {1416, 2},
    {3000, 4},
    {1985, 4},
    {1534, 3},
    {1427, 3},
    {1380, 3},
    {1494, 3},
}
rs, err := scaling.ScaleDesignMatrix(scaling.Normalization, x)
if err != nil {
    log.Fatal(err)
}
fmt.Println(rs.X)
fmt.Println(rs.Parameters)

scaling.ScaleDesignMatrix returns a Result struct which contains a scaled matrix X and scaling parameters Parameters.

Scaling parameters are crucial for further predictions. Since the computed coefficients correspond to the scaled dataset, an input vector passed to trained model's Predict method must be scaled. Scaling a single feature vector can be done via scaling.Scale method.

in := []float64{2550, 4}
in, err = scaling.Scale(in, rs.Parameters)
if err != nil {
    log.Fatal(err)
}
p, err := m.Predict(in)

Documentation

Overview

Package regression defines interfaces, structures and errors shared by other packages that implement a concrete regression algorithm.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrCannotConverge is returned if gradient descent cannot converge. It usually means that the learning rate is too large.
	ErrCannotConverge = errors.New("cannot converge")
	// ErrUnsupportedGradientDescentVariant is returned if unsupported gradient descent variant was chosen.
	ErrUnsupportedGradientDescentVariant = errors.New("unsupported gradient descent variant")
	// ErrUnsupportedConvergenceType is returned if unsupported convergence type was chosen.
	ErrUnsupportedConvergenceType = errors.New("unsupported convergence type")
	// ErrInvalidTrainingSet is returned if a design matrix is invalid or doesn't have the same length as a target vector.
	ErrInvalidTrainingSet = errors.New("invalid training set")
	// ErrInvalidFeatureVector is returned if feature vector is invalid.
	ErrInvalidFeatureVector = errors.New("invalid feature vector")
	// ErrInvalidDesignMatrix is returned if a design matrix is invalid in a given context.
	ErrInvalidDesignMatrix = errors.New("invalid design matrix")
)

Functions

This section is empty.

Types

type Model

type Model[T TargetType] interface {
	// Predict returns the predicated target value for the given input.
	Predict([]float64) (T, error)
	// Coefficients returns the trained regression model's coefficients.
	Coefficients() []float64
	// Accuracy returns calculated accuracy for trained model.
	Accuracy() float64
}

A Model is a trained regression model.

type Regression

type Regression[T TargetType] interface {
	// Run runs regression against input training set.
	// It returns trained Model if succeeded, otherwise returns an error.
	Run(context.Context, TrainingSet) (Model[T], error)
}

A Regression is a regression runner. It provides an abstraction for model training.

type RegressionFunc

type RegressionFunc[T TargetType] func(context.Context, TrainingSet) (Model[T], error)

RegressionFunc is an adapter to allow the use of plain functions as regressions.

func (RegressionFunc[T]) Run

func (f RegressionFunc[T]) Run(ctx context.Context, s TrainingSet) (Model[T], error)

Run calls f(s).

type TargetType

type TargetType interface {
	~float64 | ~int
}

TargetType is a constraint that permits two types (float64 or integer) for target value. Floating point numbers are used for continuous value of y, while integer corresponds to the discrete one.

type TrainingSet

type TrainingSet struct {
	// X is a design matrix.
	X [][]float64
	// Y is a target vector.
	Y []float64
}

TrainingSet represents a set of traning examples.

Directories

Path Synopsis
internal
gd
Package gd provides gradient descent implementation.
Package gd provides gradient descent implementation.
long
Package long provides wrapper for running long operation.
Package long provides wrapper for running long operation.
matrix
Package matrix contains the implementation of matrix calculus related to the regression algorithms.
Package matrix contains the implementation of matrix calculus related to the regression algorithms.
ts
Package ts contains implementation of the operations related to a training set.
Package ts contains implementation of the operations related to a training set.
Package linear provides linear regression model implementation.
Package linear provides linear regression model implementation.
Package logistic provides linear regression model implementation.
Package logistic provides linear regression model implementation.
Package options contains implementation of types and constants related to the regression options.
Package options contains implementation of types and constants related to the regression options.
Package scaling contains implementation of feature scaling.
Package scaling contains implementation of feature scaling.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL