Documentation ¶
Overview ¶
Package q is an agent implementation of the Q learning algorithm.
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var DefaultAgentConfig = &AgentConfig{ Hyperparameters: DefaultHyperparameters, Base: agentv1.NewBase("Q"), }
DefaultAgentConfig is the default config for a dqn agent.
View Source
var DefaultHyperparameters = &Hyperparameters{ Epsilon: common.NewConstantSchedule(0.1), Gamma: 0.6, Alpha: 0.1, AdaDivisor: 5.0, }
DefaultHyperparameters is the default agent configuration.
Functions ¶
Types ¶
type Agent ¶
type Agent struct { *agentv1.Base *Hyperparameters // contains filtered or unexported fields }
Agent that utilizes the Q-Learning algorithm.
func NewAgent ¶
func NewAgent(c *AgentConfig, env *envv1.Env) *Agent
NewAgent returns a new Q-learning agent.
type AgentConfig ¶
type AgentConfig struct { // Base for the agent. Base *agentv1.Base // Hyperparameters for the agent. *Hyperparameters // Table for the agent. Table Table }
AgentConfig is the config for a dqn agent.
type Hyperparameters ¶
type Hyperparameters struct { // Epsilon is the rate at which the agent should explore vs exploit. The lower the value // the more exploitation. Epsilon common.Schedule // Gamma is the discount factor (0≤γ≤1). It determines how much importance we want to give to future // rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas, // a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy. Gamma float32 // Alpha is the learning rate (0<α≤1). Just like in supervised learning settings, alpha is the extent // to which our Q-values are being updated in every iteration. Alpha float32 // AdaDivisor is used in adaptive learning to tune the hyperparameters. AdaDivisor float32 }
Hyperparameters for a Q-learning agent.
type MemTable ¶
type MemTable struct {
// contains filtered or unexported fields
}
MemTable is an in memory Table with a row for every state, and a column for every action. State is held as a hash of observations.
type Table ¶
type Table interface { // GetMax returns the action with the max Q value for a given state hash. GetMax(state uint32) (action int, qValue float32, err error) // Get the Q value for the given state and action. Get(state uint32, action int) (float32, error) // Set the q value of the action taken for a given state. Set(state uint32, action int, value float32) error // Clear the table. Clear() error // Pretty print the table. Print() }
Table is the qualtiy table which stores the quality of an action by state.
func NewMemTable ¶
NewMemTable returns a new MemTable with the dimensions defined by the observation and action space sizes.
Click to show internal directories.
Click to hide internal directories.