Documentation ¶
Overview ¶
Package bandit implements multi-armed bandit strategies for random choice.
Index ¶
- type AnnealingEpsilonGreedy
- func (b *AnnealingEpsilonGreedy) Counts() []uint64
- func (b *AnnealingEpsilonGreedy) Epsilon() float64
- func (b *AnnealingEpsilonGreedy) Init(nArms int)
- func (b *AnnealingEpsilonGreedy) Select() int
- func (b *AnnealingEpsilonGreedy) Serialize() interface{}
- func (b *AnnealingEpsilonGreedy) Update(arm, reward int)
- func (b *AnnealingEpsilonGreedy) Values() []float64
- type EpsilonGreedy
- type Strategy
- type Uniform
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AnnealingEpsilonGreedy ¶
type AnnealingEpsilonGreedy struct {
// contains filtered or unexported fields
}
AnnealingEpsilonGreedy implements a reinforcement learning strategy such that value of epsilon starts small then grows increasingly bigger, leading to an exploring learning strategy at start and prefering exploitation as more selections are made.
func (*AnnealingEpsilonGreedy) Counts ¶
func (b *AnnealingEpsilonGreedy) Counts() []uint64
Counts returns the frequency each arm was selected
func (*AnnealingEpsilonGreedy) Epsilon ¶
func (b *AnnealingEpsilonGreedy) Epsilon() float64
Epsilon is computed by the current number of trials such that the more trials have occured, the smaller epsilon is (on a log scale).
func (*AnnealingEpsilonGreedy) Init ¶
func (b *AnnealingEpsilonGreedy) Init(nArms int)
Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.
func (*AnnealingEpsilonGreedy) Select ¶
func (b *AnnealingEpsilonGreedy) Select() int
Select the arm with the maximizing value with probability epsilon, otherwise uniform random selection of all arms with probability 1-epsilon.
func (*AnnealingEpsilonGreedy) Serialize ¶
func (b *AnnealingEpsilonGreedy) Serialize() interface{}
Serialize the bandit strategy to dump to JSON.
func (*AnnealingEpsilonGreedy) Update ¶
func (b *AnnealingEpsilonGreedy) Update(arm, reward int)
Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).
func (*AnnealingEpsilonGreedy) Values ¶
func (b *AnnealingEpsilonGreedy) Values() []float64
Values returns the reward distribution of each arm
type EpsilonGreedy ¶
type EpsilonGreedy struct { Epsilon float64 // Probability of selecting maximizing value // contains filtered or unexported fields }
EpsilonGreedy implements a reinforcement learning strategy such that the maximizing value is selected with probability epsilon and a uniform random selection is made with probability 1-epsilon.
func (*EpsilonGreedy) Counts ¶
func (b *EpsilonGreedy) Counts() []uint64
Counts returns the frequency each arm was selected
func (*EpsilonGreedy) Init ¶
func (b *EpsilonGreedy) Init(nArms int)
Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.
func (*EpsilonGreedy) Select ¶
func (b *EpsilonGreedy) Select() int
Select the arm with the maximizing value with probability epsilon, otherwise uniform random selection of all arms with probability 1-epsilon.
func (*EpsilonGreedy) Serialize ¶
func (b *EpsilonGreedy) Serialize() interface{}
Serialize the bandit strategy to dump to JSON.
func (*EpsilonGreedy) Update ¶
func (b *EpsilonGreedy) Update(arm, reward int)
Update the selected arm with the reward so that the strategy can learn the maximizing value (conditioned by the frequency of selection).
func (*EpsilonGreedy) Values ¶
func (b *EpsilonGreedy) Values() []float64
Values returns the reward distribution of each arm
type Strategy ¶
type Strategy interface { Init(nArms int) // Initialize the bandit with n choices Select() int // Selects an arm and returns the index of the choice Update(arm, reward int) // Update the given arm with a reward Counts() []uint64 // The frequency of each arm being selected Values() []float64 // The reward distributions for each arm Serialize() interface{} // Return a JSON representation of the strategy }
Strategy specifies the methods required by an algorithm to compute multi-armed bandit probabilities for reinforcement learning. The basic mechanism allows you to initialize a strategy with n arms (or n choices). The Select() method will return a selected index based on the internal strategy, and the Update() method allows external callers to update the reward function for the selected arm.
type Uniform ¶
type Uniform struct {
// contains filtered or unexported fields
}
Uniform selects all values with an equal likelihood on every selection. While it tracks the frequency of selection and the reward costs, this information does not affect the way it selects values.
func (*Uniform) Init ¶
Init the bandit with nArms number of possible choices, which are referred to by index in both the Counts and Values arrays.
func (*Uniform) Serialize ¶
func (b *Uniform) Serialize() interface{}
Serialize the bandit strategy to dump to JSON.