rl

package
v1.5.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 27, 2022 License: BSD-3-Clause Imports: 12 Imported by: 0

README

Reinforcement Learning and Dopamine

GoDoc

The rl package provides core infrastructure for dopamine neuromodulation and reinforcement learning, including the Rescorla-Wagner learning algorithm (RW) and Temporal Differences (TD) learning, and a minimal ClampDaLayer that can be used to send an arbitrary DA signal.

  • da.go defines a simple DALayer interface for getting and setting dopamine values, and a SendDA list of layer names that has convenience methods, and ability to send dopamine to any layer that implements the DALayer interface.

  • The RW and TD DA layers use the CyclePost layer-level method to send the DA to other layers, at end of each cycle, after activation is updated. Thus, DA lags by 1 cycle, which typically should not be a problem.

  • See the separate pvlv package for the full biologically-based pvlv model on top of this basic DA infrastructure.

  • To encode positive and negative values using spiking, 2 units are used, one for positive and the other for negative. The Act value always represents the (signed) computed value, not the spike rate, where applicable.

Documentation

Overview

Package rl provides core infrastructure for dopamine neuromodulation and reinforcement learning, including the Rescorla-Wagner learning algorithm (RW) and Temporal Differences (TD) learning, and a minimal `ClampDaLayer` that can be used to send an arbitrary DA signal.

  • `da.go` defines a simple `DALayer` interface for getting and setting dopamine values, and a `SendDA` list of layer names that has convenience methods, and ability to send dopamine to any layer that implements the DALayer interface.
  • The RW and TD DA layers use the `CyclePost` layer-level method to send the DA to other layers, at end of each cycle, after activation is updated. Thus, DA lags by 1 cycle, which typically should not be a problem.
  • See the separate `pvlv` package for the full biologically-based pvlv model on top of this basic DA infrastructure.

Index

Constants

View Source
const (
	// RL is a reinforcement learning layer of any sort
	RL emer.LayerType = emer.LayerType(deep.LayerTypeN) + iota
)

Variables

View Source
var (
	// NeuronVars are extra neuron variables for pcore
	NeuronVars = []string{"DA"}

	// NeuronVarsAll is the pcore collection of all neuron-level vars
	NeuronVarsAll []string
)
View Source
var KiT_ClampAChLayer = kit.Types.AddType(&ClampAChLayer{}, LayerProps)
View Source
var KiT_ClampDaLayer = kit.Types.AddType(&ClampDaLayer{}, LayerProps)
View Source
var KiT_Layer = kit.Types.AddType(&Layer{}, LayerProps)
View Source
var KiT_LayerType = kit.Enums.AddEnumExt(deep.KiT_LayerType, LayerTypeN, kit.NotBitFlag, nil)
View Source
var KiT_Network = kit.Types.AddType(&Network{}, NetworkProps)
View Source
var KiT_RWDaLayer = kit.Types.AddType(&RWDaLayer{}, deep.LayerProps)
View Source
var KiT_RWPredLayer = kit.Types.AddType(&RWPredLayer{}, LayerProps)
View Source
var KiT_RWPrjn = kit.Types.AddType(&RWPrjn{}, axon.PrjnProps)
View Source
var KiT_RewLayer = kit.Types.AddType(&RewLayer{}, LayerProps)
View Source
var KiT_TDDaLayer = kit.Types.AddType(&TDDaLayer{}, LayerProps)
View Source
var KiT_TDRewIntegLayer = kit.Types.AddType(&TDRewIntegLayer{}, LayerProps)
View Source
var KiT_TDRewPredLayer = kit.Types.AddType(&TDRewPredLayer{}, LayerProps)
View Source
var KiT_TDRewPredPrjn = kit.Types.AddType(&TDRewPredPrjn{}, axon.PrjnProps)
View Source
var LayerProps = ki.Props{
	"EnumType:Typ": KiT_LayerType,
	"ToolBar": ki.PropSlice{
		{"Defaults", ki.Props{
			"icon": "reset",
			"desc": "return all parameters to their intial default values",
		}},
		{"InitWts", ki.Props{
			"icon": "update",
			"desc": "initialize the layer's weight values according to prjn parameters, for all *sending* projections out of this layer",
		}},
		{"InitActs", ki.Props{
			"icon": "update",
			"desc": "initialize the layer's activation values",
		}},
		{"sep-act", ki.BlankProp{}},
		{"LesionNeurons", ki.Props{
			"icon": "close",
			"desc": "Lesion (set the Off flag) for given proportion of neurons in the layer (number must be 0 -- 1, NOT percent!)",
			"Args": ki.PropSlice{
				{"Proportion", ki.Props{
					"desc": "proportion (0 -- 1) of neurons to lesion",
				}},
			},
		}},
		{"UnLesionNeurons", ki.Props{
			"icon": "reset",
			"desc": "Un-Lesion (reset the Off flag) for all neurons in the layer",
		}},
	},
}

LayerProps are required to get the extended EnumType

View Source
var NetworkProps = axon.NetworkProps

Functions

func AddRWLayers

func AddRWLayers(nt *axon.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, da axon.AxonLayer)

AddRWLayers adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero.

func AddRWLayersPy

func AddRWLayersPy(nt *axon.Network, prefix string, rel relpos.Relations, space float32) []axon.AxonLayer

AddRWLayersPy adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero. Py is Python version, returns layers as a slice

func AddTDLayers

func AddTDLayers(nt *axon.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, ri, td axon.AxonLayer)

AddTDLayers adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight.

func AddTDLayersPy

func AddTDLayersPy(nt *axon.Network, prefix string, rel relpos.Relations, space float32) []axon.AxonLayer

AddTDLayersPy adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight. Py is Python version, returns layers as a slice

func SetNeuronExtPosNeg added in v1.4.14

func SetNeuronExtPosNeg(nrn *axon.Neuron, ni int, val float32)

SetNeuronExtPosNeg sets neuron Ext value based on neuron index with positive values going in first unit, negative values rectified to positive in 2nd unit

Types

type AChLayer

type AChLayer interface {
	// GetACh returns the acetylcholine level for layer
	GetACh() float32

	// SetACh sets the acetylcholine level for layer
	SetACh(ach float32)
}

AChLayer is an interface for a layer with acetylcholine neuromodulator on it

type ClampAChLayer

type ClampAChLayer struct {
	axon.Layer
	SendACh SendACh `desc:"list of layers to send acetylcholine to"`
	ACh     float32 `desc:"acetylcholine value for this layer"`
}

ClampAChLayer is an Input layer that just sends its activity as the acetylcholine signal

func (*ClampAChLayer) Build

func (ly *ClampAChLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*ClampAChLayer) CyclePost

func (ly *ClampAChLayer) CyclePost(ltime *axon.Time)

CyclePost is called at end of Cycle We use it to send ACh, which will then be active for the next cycle of processing.

func (*ClampAChLayer) GetACh

func (ly *ClampAChLayer) GetACh() float32

func (*ClampAChLayer) SetACh

func (ly *ClampAChLayer) SetACh(ach float32)

type ClampDaLayer

type ClampDaLayer struct {
	Layer
	SendDA SendDA `desc:"list of layers to send dopamine to"`
}

ClampDaLayer is an Input layer that just sends its activity as the dopamine signal

func AddClampDaLayer

func AddClampDaLayer(nt *axon.Network, name string) *ClampDaLayer

AddClampDaLayer adds a ClampDaLayer of given name

func (*ClampDaLayer) ActFmG added in v1.5.1

func (ly *ClampDaLayer) ActFmG(ltime *axon.Time)

func (*ClampDaLayer) Build

func (ly *ClampDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*ClampDaLayer) CyclePost

func (ly *ClampDaLayer) CyclePost(ltime *axon.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*ClampDaLayer) Defaults

func (ly *ClampDaLayer) Defaults()

type DALayer

type DALayer interface {
	// GetDA returns the dopamine level for layer
	GetDA() float32

	// SetDA sets the dopamine level for layer
	SetDA(da float32)
}

DALayer is an interface for a layer with dopamine neuromodulator on it

type Layer added in v1.4.14

type Layer struct {
	axon.Layer
	DA float32 `inactive:"+" desc:"dopamine value for this layer"`
}

Layer is the base layer type for RL framework. Adds a dopamine variable to base Axon layer type.

func (*Layer) Class added in v1.5.10

func (ly *Layer) Class() string

func (*Layer) Defaults added in v1.5.10

func (ly *Layer) Defaults()

func (*Layer) GetDA added in v1.4.14

func (ly *Layer) GetDA() float32

func (*Layer) InitActs added in v1.4.14

func (ly *Layer) InitActs()

func (*Layer) SetDA added in v1.4.14

func (ly *Layer) SetDA(da float32)

func (*Layer) UnitVal1D added in v1.4.14

func (ly *Layer) UnitVal1D(varIdx int, idx int) float32

UnitVal1D returns value of given variable index on given unit, using 1-dimensional index. returns NaN on invalid index. This is the core unit var access method used by other methods, so it is the only one that needs to be updated for derived layer types.

func (*Layer) UnitVarIdx added in v1.4.14

func (ly *Layer) UnitVarIdx(varNm string) (int, error)

UnitVarIdx returns the index of given variable within the Neuron, according to UnitVarNames() list (using a map to lookup index), or -1 and error message if not found.

func (*Layer) UnitVarNum added in v1.4.14

func (ly *Layer) UnitVarNum() int

UnitVarNum returns the number of Neuron-level variables for this layer. This is needed for extending indexes in derived types.

type LayerType added in v1.5.10

type LayerType deep.LayerType

LayerType has the extensions to the emer.LayerType types, for gui

const (
	RL_ LayerType = LayerType(deep.LayerTypeN) + iota
	LayerTypeN
)

gui versions

func StringToLayerType added in v1.5.10

func StringToLayerType(s string) (LayerType, error)

func (LayerType) String added in v1.5.10

func (i LayerType) String() string

type Network added in v1.4.14

type Network struct {
	axon.Network
}

rl.Network enables display of the Da variable for pure rl models

func (*Network) AddClampDaLayer added in v1.4.14

func (nt *Network) AddClampDaLayer(name string) *ClampDaLayer

AddClampDaLayer adds a ClampDaLayer of given name

func (*Network) AddRWLayers added in v1.4.14

func (nt *Network) AddRWLayers(prefix string, rel relpos.Relations, space float32) (rew, rp, da axon.AxonLayer)

AddRWLayers adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero.

func (*Network) AddTDLayers added in v1.4.14

func (nt *Network) AddTDLayers(prefix string, rel relpos.Relations, space float32) (rew, rp, ri, td axon.AxonLayer)

AddTDLayers adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight.

func (*Network) UnitVarNames added in v1.4.14

func (nt *Network) UnitVarNames() []string

UnitVarNames returns a list of variable names available on the units in this layer

type RWDaLayer

type RWDaLayer struct {
	Layer
	SendDA    SendDA `desc:"list of layers to send dopamine to"`
	RewLay    string `desc:"name of Reward-representing layer from which this computes DA -- if nothing clamped, no dopamine computed"`
	RWPredLay string `desc:"name of RWPredLayer layer that is subtracted from the reward value"`
}

RWDaLayer computes a dopamine (DA) signal based on a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). It computes difference between r(t) and RWPred values. r(t) is accessed directly from a Rew layer -- if no external input then no DA is computed -- critical for effective use of RW only for PV cases. RWPred prediction is also accessed directly from Rew layer to avoid any issues.

func (*RWDaLayer) ActFmG

func (ly *RWDaLayer) ActFmG(ltime *axon.Time)

func (*RWDaLayer) Build

func (ly *RWDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*RWDaLayer) CyclePost

func (ly *RWDaLayer) CyclePost(ltime *axon.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*RWDaLayer) Defaults

func (ly *RWDaLayer) Defaults()

func (*RWDaLayer) RWLayers

func (ly *RWDaLayer) RWLayers() (*axon.Layer, *RWPredLayer, error)

RWLayers returns the reward and RWPred layers based on names

type RWPredLayer

type RWPredLayer struct {
	Layer
	PredRange minmax.F32 `` /* 180-byte string literal not displayed */
}

RWPredLayer computes reward prediction for a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). Activity is computed as linear function of excitatory conductance (which can be negative -- there are no constraints). Use with RWPrjn which does simple delta-rule learning on minus-plus.

func (*RWPredLayer) ActFmG

func (ly *RWPredLayer) ActFmG(ltime *axon.Time)

func (*RWPredLayer) Defaults

func (ly *RWPredLayer) Defaults()

type RWPrjn

type RWPrjn struct {
	axon.Prjn
	DaTol        float32 `` /* 208-byte string literal not displayed */
	OppSignLRate float32 `desc:"how much to learn on opposite DA sign coding neuron (0..1)"`
}

RWPrjn does dopamine-modulated learning for reward prediction: Da * Send.Act Use in RWPredLayer typically to generate reward predictions. Has no weight bounds or limits on sign etc.

func (*RWPrjn) DWt

func (pj *RWPrjn) DWt(ltime *axon.Time)

DWt computes the weight change (learning) -- on sending projections.

func (*RWPrjn) Defaults

func (pj *RWPrjn) Defaults()

func (*RWPrjn) WtFmDWt

func (pj *RWPrjn) WtFmDWt(ltime *axon.Time)

WtFmDWt updates the synaptic weight values from delta-weight changes -- on sending projections

type RewLayer added in v1.4.14

type RewLayer struct {
	Layer
}

RewLayer represents positive or negative reward values across 2 units, showing spiking rates for each, and Act always represents signed value.

func (*RewLayer) ActFmG added in v1.4.14

func (ly *RewLayer) ActFmG(ltime *axon.Time)

ActFmG computes rate-code activation from Ge, Gi, Gl conductances and updates learning running-average activations from that Act

func (*RewLayer) Defaults added in v1.4.14

func (ly *RewLayer) Defaults()

func (*RewLayer) GFmInc added in v1.4.14

func (ly *RewLayer) GFmInc(ltime *axon.Time)

type SendACh

type SendACh emer.LayNames

SendACh is a list of layers to send acetylcholine to

func (*SendACh) Add

func (sd *SendACh) Add(laynm ...string)

Add adds given layer name(s) to list

func (*SendACh) AddAllBut

func (sd *SendACh) AddAllBut(net emer.Network, excl ...string)

AddAllBut adds all layers in network except those in exlude list

func (*SendACh) AddOne

func (sd *SendACh) AddOne(laynm string)

AddOne adds one layer name to list -- python version -- doesn't support varargs

func (*SendACh) SendACh

func (sd *SendACh) SendACh(net emer.Network, ach float32)

SendACh sends acetylcholine to list of layers

func (*SendACh) Validate

func (sd *SendACh) Validate(net emer.Network, ctxt string) error

Validate ensures that LayNames layers are valid. ctxt is string for error message to provide context.

type SendDA

type SendDA emer.LayNames

SendDA is a list of layers to send dopamine to

func (*SendDA) Add

func (sd *SendDA) Add(laynm ...string)

Add adds given layer name(s) to list

func (*SendDA) AddAllBut

func (sd *SendDA) AddAllBut(net emer.Network, excl ...string)

AddAllBut adds all layers in network except those in exlude list

func (*SendDA) AddOne

func (sd *SendDA) AddOne(laynm string)

AddOne adds one layer name to list -- python version -- doesn't support varargs

func (*SendDA) SendDA

func (sd *SendDA) SendDA(net emer.Network, da float32)

SendDA sends dopamine to list of layers

func (*SendDA) Validate

func (sd *SendDA) Validate(net emer.Network, ctxt string) error

Validate ensures that LayNames layers are valid. ctxt is string for error message to provide context.

type TDDaLayer

type TDDaLayer struct {
	Layer
	SendDA   SendDA `desc:"list of layers to send dopamine to"`
	RewInteg string `desc:"name of TDRewIntegLayer from which this computes the temporal derivative"`
}

TDDaLayer computes a dopamine (DA) signal as the temporal difference (TD) between the TDRewIntegLayer activations in the minus and plus phase.

func (*TDDaLayer) ActFmG

func (ly *TDDaLayer) ActFmG(ltime *axon.Time)

func (*TDDaLayer) Build

func (ly *TDDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*TDDaLayer) CyclePost

func (ly *TDDaLayer) CyclePost(ltime *axon.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*TDDaLayer) Defaults

func (ly *TDDaLayer) Defaults()

func (*TDDaLayer) GFmInc added in v1.4.14

func (ly *TDDaLayer) GFmInc(ltime *axon.Time)

func (*TDDaLayer) RewIntegDA added in v1.4.14

func (ly *TDDaLayer) RewIntegDA(ltime *axon.Time) float32

func (*TDDaLayer) RewIntegLayer

func (ly *TDDaLayer) RewIntegLayer() (*TDRewIntegLayer, error)

type TDRewIntegLayer

type TDRewIntegLayer struct {
	Layer
	RewInteg TDRewIntegParams `desc:"parameters for reward integration"`
}

TDRewIntegLayer is the temporal differences reward integration layer. It represents estimated value V(t) in the minus phase, and estimated V(t+1) + r(t) in the plus phase. It directly accesses (t) from Rew layer, and V(t) from RewPred layer.

func (*TDRewIntegLayer) ActFmG

func (ly *TDRewIntegLayer) ActFmG(ltime *axon.Time)

func (*TDRewIntegLayer) Build

func (ly *TDRewIntegLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*TDRewIntegLayer) Defaults

func (ly *TDRewIntegLayer) Defaults()

func (*TDRewIntegLayer) GFmInc added in v1.4.14

func (ly *TDRewIntegLayer) GFmInc(ltime *axon.Time)

func (*TDRewIntegLayer) RewLayer added in v1.4.14

func (ly *TDRewIntegLayer) RewLayer() (*RewLayer, error)

func (*TDRewIntegLayer) RewPredAct added in v1.4.14

func (ly *TDRewIntegLayer) RewPredAct(ltime *axon.Time) float32

func (*TDRewIntegLayer) RewPredLayer

func (ly *TDRewIntegLayer) RewPredLayer() (*TDRewPredLayer, error)

type TDRewIntegParams

type TDRewIntegParams struct {
	Discount    float32 `desc:"discount factor -- how much to discount the future prediction from RewPred"`
	RewPredGain float32 `desc:"gain factor on rew pred activations"`
	RewPred     string  `desc:"name of TDRewPredLayer to get reward prediction from "`
	Rew         string  `desc:"name of RewLayer to get current reward from "`
}

TDRewIntegParams are params for reward integrator layer

func (*TDRewIntegParams) Defaults

func (tp *TDRewIntegParams) Defaults()

type TDRewPredLayer

type TDRewPredLayer struct {
	Layer
}

TDRewPredLayer is the temporal differences reward prediction layer. It represents estimated value V(t) in the minus phase, and computes estimated V(t+1) based on its learned weights in plus phase. Use TDRewPredPrjn for DA modulated learning.

func (*TDRewPredLayer) ActFmG

func (ly *TDRewPredLayer) ActFmG(ltime *axon.Time)

func (*TDRewPredLayer) Defaults added in v1.4.14

func (ly *TDRewPredLayer) Defaults()

type TDRewPredPrjn

type TDRewPredPrjn struct {
	axon.Prjn
	OppSignLRate float32 `desc:"how much to learn on opposite DA sign coding neuron (0..1)"`
}

TDRewPredPrjn does dopamine-modulated learning for reward prediction: DWt = Da * Send.SpkPrv (activity on *previous* timestep) Use in TDRewPredLayer typically to generate reward predictions. If the Da sign is positive, the first recv unit learns fully; for negative, second one learns fully. Lower lrate applies for opposite cases. Weights are positive-only.

func (*TDRewPredPrjn) DWt

func (pj *TDRewPredPrjn) DWt(ltime *axon.Time)

DWt computes the weight change (learning) -- on sending projections.

func (*TDRewPredPrjn) Defaults

func (pj *TDRewPredPrjn) Defaults()

func (*TDRewPredPrjn) WtFmDWt

func (pj *TDRewPredPrjn) WtFmDWt(ltime *axon.Time)

WtFmDWt updates the synaptic weight values from delta-weight changes -- on sending projections

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL