bandit

package
v0.0.0-...-f32f910 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2022 License: MIT Imports: 2 Imported by: 0

README

Bandit

Implements multi-armed bandit strategies for random choice.

Documentation

Overview

Package bandit implements multi-armed bandit strategies for random choice.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AnnealingEpsilonGreedy

type AnnealingEpsilonGreedy struct {
	// contains filtered or unexported fields
}

AnnealingEpsilonGreedy implements a reinforcement learning strategy such that value of epsilon starts small then grows increasingly bigger, leading to an exploring learning strategy at start and prefering exploitation as more selections are made.

func (*AnnealingEpsilonGreedy) Counts

func (b *AnnealingEpsilonGreedy) Counts() []uint64

Counts returns the frequency each arm was selected

func (*AnnealingEpsilonGreedy) Epsilon

func (b *AnnealingEpsilonGreedy) Epsilon() float64

Epsilon is computed by the current number of trials such that the more trials have occured, the smaller epsilon is (on a log scale).

func (*AnnealingEpsilonGreedy) Init

func (b *AnnealingEpsilonGreedy) Init(nArms int)

Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.

func (*AnnealingEpsilonGreedy) Select

func (b *AnnealingEpsilonGreedy) Select() int

Select the arm with the maximizing value with probability epsilon, otherwise uniform random selection of all arms with probability 1-epsilon.

func (*AnnealingEpsilonGreedy) Serialize

func (b *AnnealingEpsilonGreedy) Serialize() interface{}

Serialize the bandit strategy to dump to JSON.

func (*AnnealingEpsilonGreedy) Update

func (b *AnnealingEpsilonGreedy) Update(arm, reward int)

Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).

func (*AnnealingEpsilonGreedy) Values

func (b *AnnealingEpsilonGreedy) Values() []float64

Values returns the reward distribution of each arm

type EpsilonGreedy

type EpsilonGreedy struct {
	Epsilon float64 // Probability of selecting maximizing value
	// contains filtered or unexported fields
}

EpsilonGreedy implements a reinforcement learning strategy such that the maximizing value is selected with probability epsilon and a uniform random selection is made with probability 1-epsilon.

func (*EpsilonGreedy) Counts

func (b *EpsilonGreedy) Counts() []uint64

Counts returns the frequency each arm was selected

func (*EpsilonGreedy) Init

func (b *EpsilonGreedy) Init(nArms int)

Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.

func (*EpsilonGreedy) Select

func (b *EpsilonGreedy) Select() int

Select the arm with the maximizing value with probability epsilon, otherwise uniform random selection of all arms with probability 1-epsilon.

func (*EpsilonGreedy) Serialize

func (b *EpsilonGreedy) Serialize() interface{}

Serialize the bandit strategy to dump to JSON.

func (*EpsilonGreedy) Update

func (b *EpsilonGreedy) Update(arm, reward int)

Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).

func (*EpsilonGreedy) Values

func (b *EpsilonGreedy) Values() []float64

Values returns the reward distribution of each arm

type Strategy

type Strategy interface {
	Init(nArms int)         // Initialize the bandit with n choices
	Select() int            // Selects an arm and returns the index of the choice
	Update(arm, reward int) // Update the given arm with a reward
	Counts() []uint64       // The frequency of each arm being selected
	Values() []float64      // The reward distributions for each arm
	Serialize() interface{} // Return a JSON representation of the strategy
}

Strategy specifies the methods required by an algorithm to compute multi-armed bandit probabilities for reinforcement learning. The basic mechanism allows you to initialize a strategy with n arms (or n choices). The Select() method will return a selected index based on the internal strategy, and the Update() method allows external callers to update the reward function for the selected arm.

type Uniform

type Uniform struct {
	// contains filtered or unexported fields
}

Uniform selects all values with an equal likelihood on every selection. While it tracks the frequency of selection and the reward costs, this information does not affect the way it selects values.

func (*Uniform) Counts

func (b *Uniform) Counts() []uint64

Counts returns the frequency each arm was selected

func (*Uniform) Init

func (b *Uniform) Init(nArms int)

Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.

func (*Uniform) Select

func (b *Uniform) Select() int

Select the arm with equal probability for each choice.

func (*Uniform) Serialize

func (b *Uniform) Serialize() interface{}

Serialize the bandit strategy to dump to JSON.

func (*Uniform) Update

func (b *Uniform) Update(arm, reward int)

Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).

func (*Uniform) Values

func (b *Uniform) Values() []float64

Values returns the reward distribution of each arm

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL