bandit

package

v0.0.0-...-f32f910 Latest Latest Go to latest Published: Apr 2, 2022 License: MIT Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bbengfort/x

Links

Open Source Insights

README ¶

Bandit

Implements multi-armed bandit strategies for random choice.

Documentation ¶

Overview ¶

Package bandit implements multi-armed bandit strategies for random choice.

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AnnealingEpsilonGreedy ¶

type AnnealingEpsilonGreedy struct {
	// contains filtered or unexported fields
}

AnnealingEpsilonGreedy implements a reinforcement learning strategy such that value of epsilon starts small then grows increasingly bigger, leading to an exploring learning strategy at start and prefering exploitation as more selections are made.

func (*AnnealingEpsilonGreedy) Counts ¶

func (b *AnnealingEpsilonGreedy) Counts() []uint64

Counts returns the frequency each arm was selected

func (*AnnealingEpsilonGreedy) Epsilon ¶

func (b *AnnealingEpsilonGreedy) Epsilon() float64

Epsilon is computed by the current number of trials such that the more trials have occured, the smaller epsilon is (on a log scale).

func (*AnnealingEpsilonGreedy) Init ¶

func (b *AnnealingEpsilonGreedy) Init(nArms int)

Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.

func (*AnnealingEpsilonGreedy) Select ¶

func (b *AnnealingEpsilonGreedy) Select() int

Select the arm with the maximizing value with probability epsilon, otherwise uniform random selection of all arms with probability 1-epsilon.

func (*AnnealingEpsilonGreedy) Serialize ¶

func (b *AnnealingEpsilonGreedy) Serialize() interface{}

Serialize the bandit strategy to dump to JSON.

func (*AnnealingEpsilonGreedy) Update ¶

func (b *AnnealingEpsilonGreedy) Update(arm, reward int)

Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).

func (*AnnealingEpsilonGreedy) Values ¶

func (b *AnnealingEpsilonGreedy) Values() []float64

Values returns the reward distribution of each arm

type EpsilonGreedy ¶

type EpsilonGreedy struct {
	Epsilon float64 // Probability of selecting maximizing value
	// contains filtered or unexported fields
}

EpsilonGreedy implements a reinforcement learning strategy such that the maximizing value is selected with probability epsilon and a uniform random selection is made with probability 1-epsilon.

func (*EpsilonGreedy) Counts ¶

func (b *EpsilonGreedy) Counts() []uint64

Counts returns the frequency each arm was selected

func (*EpsilonGreedy) Init ¶

func (b *EpsilonGreedy) Init(nArms int)

Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.

func (*EpsilonGreedy) Select ¶

func (b *EpsilonGreedy) Select() int

Select the arm with the maximizing value with probability epsilon, otherwise uniform random selection of all arms with probability 1-epsilon.

func (*EpsilonGreedy) Serialize ¶

func (b *EpsilonGreedy) Serialize() interface{}

Serialize the bandit strategy to dump to JSON.

func (*EpsilonGreedy) Update ¶

func (b *EpsilonGreedy) Update(arm, reward int)

Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).

func (*EpsilonGreedy) Values ¶

func (b *EpsilonGreedy) Values() []float64

Values returns the reward distribution of each arm

type Strategy ¶

type Strategy interface {
	Init(nArms int)         // Initialize the bandit with n choices
	Select() int            // Selects an arm and returns the index of the choice
	Update(arm, reward int) // Update the given arm with a reward
	Counts() []uint64       // The frequency of each arm being selected
	Values() []float64      // The reward distributions for each arm
	Serialize() interface{} // Return a JSON representation of the strategy
}

Strategy specifies the methods required by an algorithm to compute multi-armed bandit probabilities for reinforcement learning. The basic mechanism allows you to initialize a strategy with n arms (or n choices). The Select() method will return a selected index based on the internal strategy, and the Update() method allows external callers to update the reward function for the selected arm.

type Uniform ¶

type Uniform struct {
	// contains filtered or unexported fields
}

Uniform selects all values with an equal likelihood on every selection. While it tracks the frequency of selection and the reward costs, this information does not affect the way it selects values.

func (*Uniform) Counts ¶

func (b *Uniform) Counts() []uint64

Counts returns the frequency each arm was selected

func (*Uniform) Init ¶

func (b *Uniform) Init(nArms int)

Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.

func (*Uniform) Select ¶

func (b *Uniform) Select() int

Select the arm with equal probability for each choice.

func (*Uniform) Serialize ¶

func (b *Uniform) Serialize() interface{}

Serialize the bandit strategy to dump to JSON.

func (*Uniform) Update ¶

func (b *Uniform) Update(arm, reward int)

Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).

func (*Uniform) Values ¶

func (b *Uniform) Values() []float64

Values returns the reward distribution of each arm

Source Files ¶

View all Source files

bandit.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL