qlearning

package module

v0.0.0-...-09709ec Latest Latest Go to latest Published: Sep 22, 2020 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/temorfeouz/qlearning

Links

Open Source Insights

README ¶

qlearning

The qlearning package provides a series of interfaces and utilities to implement the Q-Learning algorithm in Go.

This project was largely inspired by flappybird-qlearning- bot.

Until a release is tagged, qlearning should be considered highly experimental and mostly a fun toy.

Some refactor, add ability for store q-table to file add ability for not only randomly select next action. Add another example, resolving nqueen problem(with some reservations)

Installation

$ go get https://github.com/temorfeouz/qlearning

Quickstart

qlearning provides example implementations in the examples directory of the project.

hangman.go provides a naive implementation of Hangman for use with qlearning.

$ cd $GOPATH/src/github.com/temorfeouz/qlearning/examples
$ go run hangman.go -h
Usage of hangman:
  -debug
        Set debug
  -games int
        Play N games (default 5000000)
  -progress int
        Print progress messages every N games (default 1000)
  -wordlist string
        Path to a wordlist (default "./wordlist.txt")
  -words int
        Use N words from wordlist (default 10000)

By default, running hangman.go will play millions of games against a 10,000-word corpus. That's a bit overkill for just trying out qlearning. You can run it against a smaller number of words for a few number of games using the -games and -words flags.

$ go run hangman.go -words 100 -progress 1000 -games 5000
100 words loaded
1000 games played: 92 WINS 908 LOSSES 9% WIN RATE
2000 games played: 447 WINS 1553 LOSSES 36% WIN RATE
3000 games played: 1064 WINS 1936 LOSSES 62% WIN RATE
4000 games played: 1913 WINS 2087 LOSSES 85% WIN RATE
5000 games played: 2845 WINS 2155 LOSSES 93% WIN RATE

Agent performance: 5000 games played, 2845 WINS 2155 LOSSES 57% WIN RATE

"WIN RATE" per progress report is isolated within that cycle, a group of 1000 games in this example. The win rate is meant to show the velocity of learning by the agent. If it is "learning", the win rate should be increasing until reaching convergence.

As you can see, after 5000 games, the agent is able to "learn" and play hangman against a 100-word vocabulary.

Usage

See godocs for the package documentation.

Documentation ¶

Overview ¶

Package qlearning is an experimental set of interfaces and helpers to implement the Q-learning algorithm in Go.

This is highly experimental and should be considered a toy.

See https://github.com/temorfeouz/qlearning/tree/master/examples for implementation examples.

Index ¶

type Action
type Agent
type Rewarder
type SimpleAgent
- func NewSimpleAgent(lr, d float32) *SimpleAgent
type State
type StateAction
- func NewStateAction(state State, action Action, val float32) *StateAction
- func Next(agent Agent, state State, epsilon float32) *StateAction

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Action ¶

type Action interface {
	String() string
	Apply(State) State
}

Action is an interface wrapping an action that can be applied to the model's current state.

BUG (temorfeouz): A state should apply an action, not the other way around.

type Agent ¶

type Agent interface {
	// Learn updates the model for a given state and action, using the
	// provided Rewarder implementation.
	Learn(*StateAction, Rewarder)

	// Value returns the current Q-value for a State and Action.
	Value(State, Action) float32

	// Return a string representation of the Agent.
	String() string
}

Agent is an interface for a model's agent and is able to learn from actions and return the current Q-value of an action at a given state.

type Rewarder ¶

type Rewarder interface {
	// Reward calculates the reward value for a given action in a given
	// state.
	Reward(action *StateAction) float32
}

Rewarder is an interface wrapping the ability to provide a reward for the execution of an action in a given state.

type SimpleAgent ¶

type SimpleAgent struct {
	// contains filtered or unexported fields
}

SimpleAgent is an Agent implementation that stores Q-values in a map of maps.

func NewSimpleAgent ¶

func NewSimpleAgent(lr, d float32) *SimpleAgent

NewSimpleAgent creates a SimpleAgent with the provided learning rate and discount factor.

func (*SimpleAgent) Export ¶

func (agent *SimpleAgent) Export(w io.Writer)

func (*SimpleAgent) Import ¶

func (agent *SimpleAgent) Import(r io.Reader)

func (*SimpleAgent) Learn ¶

func (agent *SimpleAgent) Learn(action *StateAction, reward Rewarder)

Learn updates the existing Q-value for the given State and Action using the Rewarder.

See https://en.wikipedia.org/wiki/Q-learning#Algorithm

func (*SimpleAgent) String ¶

func (agent *SimpleAgent) String() string

String returns the current Q-value map as a printed string.

BUG (temorfeouz): This is useless.

func (*SimpleAgent) Value ¶

func (agent *SimpleAgent) Value(state State, action Action) float32

Value gets the current Q-value for a State and Action.

type State ¶

type State interface {

	// String returns a string representation of the given state.
	// Implementers should take care to insure that this is a consistent
	// hash for a given state.
	String() string

	// Next provides a slice of possible Actions that could be applied to
	// a state.
	Next() []Action
}

State is an interface wrapping the current state of the model.

type StateAction ¶

type StateAction struct {
	State  State
	Action Action
	Value  float32
}

StateAction is a struct grouping an action to a given State. Additionally, a Value can be associated to StateAction, which is typically the Q-value.

func NewStateAction ¶

func NewStateAction(state State, action Action, val float32) *StateAction

NewStateAction creates a new StateAction for a State and Action.

func Next ¶

func Next(agent Agent, state State, epsilon float32) *StateAction

Next uses an Agent and State to find the highest scored Action.

In the case of Q-value ties for a set of actions, a random value is selected.

Source Files ¶

View all Source files

qlearning.go

Directories ¶

Path	Synopsis
examples
hangman An example implementation the qlearning interfaces.	An example implementation the qlearning interfaces.
labirint
nqueens

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL