regression

package module

v0.3.1 Latest Latest Go to latest Published: May 28, 2022 License: MIT Imports: 2 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

README ¶

regression

regression is a simple, written from scratch Go library for basic variants of two the most popular models from the Generalized Linear Models (GLM) family.

regression/linear provides implementation of the linear regression model.

regression/logistic provides implementation of the logistic regression model.

Install

go get github.com/erni27/regression

Why?

Does the world need another not fancy machine learning library? Actually, no. The main puropose of regression library is learning. Machine learning and AI algorithms remains mystified and magic. So here it is, a simple, written from scratch implementation of one of the most popular algorithms solving regression and classification problem. It doesn't throw the responsibility for underyling math (like matrix calculus and iterative optimisation) to the external packages. Everything is embedded in this repository.

Linear regression

regression/linear package provides two ways of computing the linear regression coefficients.

The first one uses an itertaive approach - gradient descent algorithm. Options from regression/options package allows to configure algorithm parameters. What can be configured are listed below:

Learning rate - determines the size of each step taken by gradient descent
Gradient descent variant - determines the gradient descent variant (batch or stochastic).
Convergence type - determines the convergence type (iterative or automatic). An iterative convergence means that gradient descent will run excatly i. On the other hand, an automatic convergence declares convergence if a cost function decreseas less than t in one iteration.

// Creates regression options with:
// 1) Learning rate equals 1e-8.
// 2) Batch gradient descent variant.
// 3) Iterative convergance with number of iterations equal 1000.
opt := options.WithIterativeConvergence(1e-8, options.Batch, 1000)
// Initialize linear regression with gradient descent (numerical approach).
r := linear.WithGradientDescent(opt)
// Create design matrix as a 2D slice.
x := [][]float64{
    {2104, 3},
    {1600, 3},
    {2400, 3},
    {1416, 2},
    {3000, 4},
    {1985, 4},
    {1534, 3},
    {1427, 3},
    {1380, 3},
    {1494, 3},
}
// Create target vector as a slice.
y := []float64{399900, 329900, 369000, 232000, 539900, 299900, 314900, 198999, 212000, 242500}
ctx := context.Background()
// Run linear regression.
m, err := r.Run(ctx, regression.TrainingSet{X: x, Y: y})
if err != nil {
    log.Fatal(err)
}
fmt.Println(m)
acc := m.Accuracy()
fmt.Printf("Accuracy: %f.\n", acc)
coeffs := m.Coefficients()
fmt.Printf("Coefficients: %v.\n", coeffs)
// Do a predicition for a new input feature vector.
in := []float64{2550, 4}
p, err := m.Predict(in)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("For vector %v, predicted value equals %f.\n", in, p)

The preceding code:

Initializes regression options.
Initializes linear regression with gradient descent and configures it through created options.
Prepare TrainingSet.
Run linear regression.
Predict a value for a new vector.

An automatic convergance can be configured through the WithAutomaticConvergence factory method from regression/options package.

// Initalize regression options with
// 1) Learning rate equals 1e-8.
// 2) Stochastic gradient descent variant.
// 3) Automatic convergance with threshold equals 1e-6.
opt := options.WithAutomaticConvergence(1e-8, options.Stochastic, 1e-6)

Convergance through the automatic convergance test is rarely used in practice since it's really hard to set an appropriate threshold.

regression/linear package offers a second way of computing linear regression coefficients - by solving the normal equation (analytical approach). Basically, to minimize the cost function, it sets its derivatives to zero.

// Initialize linear regression with normal equation (analytical approach).
r := linear.WithNormalEquation()
// Create design matrix as a 2D slice.
x := [][]float64{
    {2104, 3},
    {1600, 3},
    {2400, 3},
    {1416, 2},
    {3000, 4},
    {1985, 4},
    {1534, 3},
    {1427, 3},
    {1380, 3},
    {1494, 3},
}
// Create target vector as a slice.
y := []float64{399900, 329900, 369000, 232000, 539900, 299900, 314900, 198999, 212000, 242500}
ctx := context.Background()
// Run linear regression.
m, err := r.Run(ctx, regression.TrainingSet{X: x, Y: y})
if err != nil {
    log.Fatal(err)
}
fmt.Println(m)
acc := m.Accuracy()
fmt.Printf("Accuracy: %f.\n", acc)
coeffs := m.Coefficients()
fmt.Printf("Coefficients: %v.\n", coeffs)
// Do a predicition for a new input feature vector.
in := []float64{2550, 4}
p, err := m.Predict(in)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("For vector %v, predicted value equals %f.\n", in, p)

The preceding code:

Initializes linear regression with normal equation.
Prepare TrainingSet.
Run linear regression.
Predict a value for a new vector.

With the normal equation, there is no need to choose alpha but it can be slow for a very large number of features. It's caused by computing the matrix inversion under the hood.

Logistic regression

regression/logistic, unlike regression/linear, provides only an iterative approach for computing the logistic regression coefficients. It uses exactly the same algorithm like the linear regression - gradient descent. So everything regarding gradient descent from the previous section applies here either.

// Creates regression options with:
// 1) Learning rate equals 1e-2.
// 2) Batch gradient descent variant.
// 3) Iterative convergance with number of iterations equal 100.
opt := options.WithIterativeConvergence(1e-2, options.Batch, 100)
// Initialize logistic regression with normal equation (analytical approach).
r := logistic.WithGradientDescent(opt)
// Create design matrix as a 2D slice.
x := [][]float64{
    {34, 78},
    {30, 43},
    {35, 72},
    {60, 86},
    {79, 75},
    {45, 56},
    {61, 96},
    {75, 46},
    {76, 87},
    {84, 43},
}
// Create target vector as a slice.
y := []float64{0, 0, 0, 1, 1, 0, 1, 1, 1, 1}
ctx := context.Background()
// Run logistic regression.
m, err := r.Run(ctx, regression.TrainingSet{X: x, Y: y})
if err != nil {
    log.Fatal(err)
}
fmt.Println(m)
acc := m.Accuracy()
fmt.Printf("Accuracy: %f.\n", acc)
coeffs := m.Coefficients()
fmt.Printf("Coefficients: %v.\n", coeffs)
// Do a predicition for a new input feature vector.
in := []float64{52, 88}
p, err := m.Predict(in)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("For vector %v, predicted value equals %d.\n", in, p)

The preceding code:

Initializes regression options.
Initializes logistic regression with gradient descent and configures it through created options.
Prepare TrainingSet.
Run logistic regression.
Predict a value for a new vector.

Be careful when using an automatic convergance with logistic regression. Without a feature scaling it often cannot converge and computes forever (can be stopped via Context).

Feature scaling

Gradient descent can be much faster when a design matrix consist of features approximately within the same range.

regression/scaling package contains implementation of two feature scaling techniques:

Mean normalization
Standarization

The following code presents a feature scaling with a normalization as a feature scaling technique.

x := [][]float64{
    {2104, 3},
    {1600, 3},
    {2400, 3},
    {1416, 2},
    {3000, 4},
    {1985, 4},
    {1534, 3},
    {1427, 3},
    {1380, 3},
    {1494, 3},
}
rs, err := scaling.ScaleDesignMatrix(scaling.Normalization, x)
if err != nil {
    log.Fatal(err)
}
fmt.Println(rs.X)
fmt.Println(rs.Parameters)

scaling.ScaleDesignMatrix returns a Result struct which contains a scaled matrix X and scaling parameters Parameters.

Scaling parameters are crucial for further predictions. Since the computed coefficients correspond to the scaled dataset, an input vector passed to trained model's Predict method must be scaled. Scaling a single feature vector can be done via scaling.Scale method.

in := []float64{2550, 4}
in, err = scaling.Scale(in, rs.Parameters)
if err != nil {
    log.Fatal(err)
}
p, err := m.Predict(in)

Documentation ¶

Overview ¶

Package regression defines interfaces, structures and errors shared by other packages that implement a concrete regression algorithm.

Index ¶

Variables
type Model
type Regression
type RegressionFunc
- func (f RegressionFunc[T]) Run(ctx context.Context, s TrainingSet) (Model[T], error)
type TargetType
type TrainingSet

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// ErrCannotConverge is returned if gradient descent cannot converge. It usually means that the learning rate is too large.
	ErrCannotConverge = errors.New("cannot converge")
	// ErrUnsupportedGradientDescentVariant is returned if unsupported gradient descent variant was chosen.
	ErrUnsupportedGradientDescentVariant = errors.New("unsupported gradient descent variant")
	// ErrUnsupportedConvergenceType is returned if unsupported convergence type was chosen.
	ErrUnsupportedConvergenceType = errors.New("unsupported convergence type")
	// ErrInvalidTrainingSet is returned if a design matrix is invalid or doesn't have the same length as a target vector.
	ErrInvalidTrainingSet = errors.New("invalid training set")
	// ErrInvalidFeatureVector is returned if feature vector is invalid.
	ErrInvalidFeatureVector = errors.New("invalid feature vector")
	// ErrInvalidDesignMatrix is returned if a design matrix is invalid in a given context.
	ErrInvalidDesignMatrix = errors.New("invalid design matrix")
)

Functions ¶

This section is empty.

Types ¶

type Model ¶

type Model[T TargetType] interface {
	// Predict returns the predicated target value for the given input.
	Predict([]float64) (T, error)
	// Coefficients returns the trained regression model's coefficients.
	Coefficients() []float64
	// Accuracy returns calculated accuracy for trained model.
	Accuracy() float64
}

A Model is a trained regression model.

type Regression ¶

type Regression[T TargetType] interface {
	// Run runs regression against input training set.
	// It returns trained Model if succeeded, otherwise returns an error.
	Run(context.Context, TrainingSet) (Model[T], error)
}

A Regression is a regression runner. It provides an abstraction for model training.

type RegressionFunc ¶

type RegressionFunc[T TargetType] func(context.Context, TrainingSet) (Model[T], error)

RegressionFunc is an adapter to allow the use of plain functions as regressions.

func (RegressionFunc[T]) Run ¶

func (f RegressionFunc[T]) Run(ctx context.Context, s TrainingSet) (Model[T], error)

Run calls f(s).

type TargetType ¶

type TargetType interface {
	~float64 | ~int
}

TargetType is a constraint that permits two types (float64 or integer) for target value. Floating point numbers are used for continuous value of y, while integer corresponds to the discrete one.

type TrainingSet ¶

type TrainingSet struct {
	// X is a design matrix.
	X [][]float64
	// Y is a target vector.
	Y []float64
}

TrainingSet represents a set of traning examples.

Source Files ¶

View all Source files

regression.go

Directories ¶

Path	Synopsis
internal
gd Package gd provides gradient descent implementation.	Package gd provides gradient descent implementation.
long Package long provides wrapper for running long operation.	Package long provides wrapper for running long operation.
matrix Package matrix contains the implementation of matrix calculus related to the regression algorithms.	Package matrix contains the implementation of matrix calculus related to the regression algorithms.
regressiontest
ts Package ts contains implementation of the operations related to a training set.	Package ts contains implementation of the operations related to a training set.
linear Package linear provides linear regression model implementation.	Package linear provides linear regression model implementation.
logistic Package logistic provides linear regression model implementation.	Package logistic provides linear regression model implementation.
options Package options contains implementation of types and constants related to the regression options.	Package options contains implementation of types and constants related to the regression options.
scaling Package scaling contains implementation of feature scaling.	Package scaling contains implementation of feature scaling.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL