Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var DefaultBrainOptions = BrainOptions{ TemporalWindow: 1, ExperienceSize: 30000, StartLearnThreshold: int(math.Floor(math.Min(30000*0.1, 1000))), Gamma: 0.8, LearningStepsTotal: 100000, LearningStepsBurnin: 3000, EpsilonMin: 0.05, EpsilonTestTime: 0.01, RandomActionDistribution: nil, TDTrainerOptions: convnet.TrainerOptions{ LearningRate: 0.01, Momentum: 0.0, BatchSize: 64, L2Decay: 0.01, }, }
Functions ¶
This section is empty.
Types ¶
type Brain ¶
type Brain struct { TemporalWindow int ExperienceSize int StartLearnThreshold int Gamma float64 LearningStepsTotal int LearningStepsBurnin int EpsilonMin float64 EpsilonTestTime float64 RandomActionDistribution []float64 NetInputs int NumStates int NumActions int WindowSize int StateWindow [][]float64 ActionWindow []int RewardWindow []float64 NetWindow [][]float64 Rand *rand.Rand ValueNet convnet.Net TDTrainer *convnet.Trainer Experience []Experience Age int ForwardPasses int Epsilon float64 LatestReward float64 LastInputArray []float64 AverageRewardWindow *cnnutil.Window AverageLossWindow *cnnutil.Window Learning bool }
A Brain object does all the magic. over time it receives some inputs and some rewards and its job is to set the outputs to maximize the expected reward
func (*Brain) NetInput ¶
return s = (x,a,x,a,x,a,xt) state vector. It"s a concatenation of last window_size (x,a) pairs and current state x
func (*Brain) Policy ¶
compute the value of doing any action in this state and return the argmax action and its value
func (*Brain) RandomAction ¶
a bit of a helper function. It returns a random action we are abstracting this away because in future we may want to do more sophisticated things. For example some actions could be more or less likely at "rest"/default state.
type BrainOptions ¶
type BrainOptions struct { // in number of time steps, of temporal memory // the ACTUAL input to the net will be (x,a) temporal_window times, and followed by current x // so to have no information from previous time step going into value function, set to 0. TemporalWindow int // size of experience replay memory ExperienceSize int // number of examples in experience replay memory before we begin learning StartLearnThreshold int // gamma is a crucial parameter that controls how much plan-ahead the agent does. In [0,1] Gamma float64 // number of steps we will learn for LearningStepsTotal int // how many steps of the above to perform only random actions (in the beginning)? LearningStepsBurnin int // what epsilon value do we bottom out on? 0.0 => purely deterministic policy at end EpsilonMin float64 // what epsilon to use at test time? (i.e. when learning is disabled) EpsilonTestTime float64 // advanced feature. Sometimes a random action should be biased towards some values // for example in flappy bird, we may want to choose to not flap more often // this better sum to 1 by the way, and be of length this.num_actions RandomActionDistribution []float64 LayerDefs []convnet.LayerDef HiddenLayerSizes []int Rand *rand.Rand TDTrainerOptions convnet.TrainerOptions }
Click to show internal directories.
Click to hide internal directories.