Documentation ¶
Overview ¶
Package her is an agent implementation of the Hindsight Experience Replay algorithm.
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var DefaultAgentConfig = &AgentConfig{ Hyperparameters: DefaultHyperparameters, PolicyConfig: DefaultPolicyConfig, Base: agentv1.NewBase("HER"), SuccessfulReward: 0, MemorySize: 1e4, }
DefaultAgentConfig is the default config for a dqn+her agent.
View Source
var DefaultFCLayerBuilder = func(x, y *modelv1.Input) []layer.Config { return []layer.Config{ layer.FC{Input: x.Squeeze()[0], Output: 512}, layer.FC{Input: 512, Output: 512}, layer.FC{Input: 512, Output: y.Squeeze()[0], Activation: layer.Linear}, } }
DefaultFCLayerBuilder is a default fully connected layer builder.
View Source
var DefaultHyperparameters = &Hyperparameters{ Epsilon: common.DefaultDecaySchedule(), Gamma: 0.9, UpdateTargetEpisodes: 50, }
DefaultHyperparameters are the default hyperparameters.
View Source
var DefaultPolicyConfig = &PolicyConfig{ Loss: modelv1.MSE, Optimizer: g.NewAdamSolver(g.WithBatchSize(128), g.WithLearnRate(0.0005)), LayerBuilder: DefaultFCLayerBuilder, BatchSize: 128, Track: true, }
DefaultPolicyConfig are the default hyperparameters for a policy.
Functions ¶
Types ¶
type Agent ¶
type Agent struct { // Base for the agent. *agentv1.Base // Hyperparameters for the dqn+her agent. *Hyperparameters Policy model.Model TargetPolicy model.Model Epsilon common.Schedule // contains filtered or unexported fields }
Agent is a dqn+her agent.
func NewAgent ¶
func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)
NewAgent returns a new dqn+her agent.
type AgentConfig ¶
type AgentConfig struct { // Base for the agent. Base *agentv1.Base // Hyperparameters for the agent. *Hyperparameters // PolicyConfig for the agent. PolicyConfig *PolicyConfig // SuccessfulReward is the reward for reaching the goal. SuccessfulReward float32 // MemorySize is the size of the memory. MemorySize int }
AgentConfig is the config for a dqn+her agent.
type Event ¶
type Event struct { *envv1.Outcome // State by which the action was taken. State *tensor.Dense // Goal the agent is trying to reach. Goal *tensor.Dense // contains filtered or unexported fields }
Event is an event that occurred.
type Hyperparameters ¶
type Hyperparameters struct { // Gamma is the discount factor (0≤γ≤1). It determines how much importance we want to give to future // rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas, // a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy. Gamma float32 // Epsilon is the rate at which the agent should exploit vs explore. Epsilon common.Schedule // UpdateTargetEpisodes determines how often the target network updates its parameters. UpdateTargetEpisodes int }
Hyperparameters for the dqn+her agent.
type LayerBuilder ¶
LayerBuilder builds layers.
type Memory ¶
type Memory struct {
// contains filtered or unexported fields
}
Memory for the dqn agent.
type PolicyConfig ¶
type PolicyConfig struct { // Loss function to evaluate network performance. Loss modelv1.Loss // Optimizer to optimize the weights with regards to the error. Optimizer g.Solver // LayerBuilder is a builder of layer. LayerBuilder LayerBuilder // Batch size to train on. BatchSize int // Track is whether to track the model. Track bool }
PolicyConfig are the hyperparameters for a policy.
Click to show internal directories.
Click to hide internal directories.