Documentation ¶
Overview ¶
Package rl provides core infrastructure for dopamine neuromodulation and reinforcement learning, including the Rescorla-Wagner learning algorithm (RW) and Temporal Differences (TD) learning, and a minimal `ClampDaLayer` that can be used to send an arbitrary DA signal.
- `da.go` defines a simple `DALayer` interface for getting and setting dopamine values, and a `SendDA` list of layer names that has convenience methods, and ability to send dopamine to any layer that implements the DALayer interface.
- The RW and TD DA layers use the `CyclePost` layer-level method to send the DA to other layers, at end of each cycle, after activation is updated. Thus, DA lags by 1 cycle, which typically should not be a problem.
- See the separate `pvlv` package for the full biologically-based pvlv model on top of this basic DA infrastructure.
Index ¶
- Variables
- func AddRWLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, da leabra.LeabraLayer)
- func AddRWLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer
- func AddTDLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, ri, td leabra.LeabraLayer)
- func AddTDLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer
- type AChLayer
- type ClampAChLayer
- type ClampDaLayer
- type DALayer
- type RWDaLayer
- func (ly *RWDaLayer) ActFmG(ltime *leabra.Time)
- func (ly *RWDaLayer) Build() error
- func (ly *RWDaLayer) CyclePost(ltime *leabra.Time)
- func (ly *RWDaLayer) Defaults()
- func (ly *RWDaLayer) GetDA() float32
- func (ly *RWDaLayer) RWLayers() (*leabra.Layer, *RWPredLayer, error)
- func (ly *RWDaLayer) SetDA(da float32)
- type RWPredLayer
- type RWPrjn
- type SendACh
- type SendDA
- type TDDaLayer
- func (ly *TDDaLayer) ActFmG(ltime *leabra.Time)
- func (ly *TDDaLayer) Build() error
- func (ly *TDDaLayer) CyclePost(ltime *leabra.Time)
- func (ly *TDDaLayer) Defaults()
- func (ly *TDDaLayer) GetDA() float32
- func (ly *TDDaLayer) RewIntegLayer() (*TDRewIntegLayer, error)
- func (ly *TDDaLayer) SetDA(da float32)
- type TDRewIntegLayer
- type TDRewIntegParams
- type TDRewPredLayer
- type TDRewPredPrjn
Constants ¶
This section is empty.
Variables ¶
var KiT_ClampAChLayer = kit.Types.AddType(&ClampAChLayer{}, leabra.LayerProps)
var KiT_ClampDaLayer = kit.Types.AddType(&ClampDaLayer{}, leabra.LayerProps)
var KiT_RWDaLayer = kit.Types.AddType(&RWDaLayer{}, deep.LayerProps)
var KiT_RWPredLayer = kit.Types.AddType(&RWPredLayer{}, leabra.LayerProps)
var KiT_TDDaLayer = kit.Types.AddType(&TDDaLayer{}, leabra.LayerProps)
var KiT_TDRewIntegLayer = kit.Types.AddType(&TDRewIntegLayer{}, leabra.LayerProps)
var KiT_TDRewPredLayer = kit.Types.AddType(&TDRewPredLayer{}, leabra.LayerProps)
var KiT_TDRewPredPrjn = kit.Types.AddType(&TDRewPredPrjn{}, deep.PrjnProps)
Functions ¶
func AddRWLayers ¶
func AddRWLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, da leabra.LeabraLayer)
AddRWLayers adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero.
func AddRWLayersPy ¶ added in v1.1.15
func AddRWLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer
AddRWLayersPy adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero. Py is Python version, returns layers as a slice
func AddTDLayers ¶
func AddTDLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, ri, td leabra.LeabraLayer)
AddTDLayers adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight.
func AddTDLayersPy ¶ added in v1.1.15
func AddTDLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer
AddTDLayersPy adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight. Py is Python version, returns layers as a slice
Types ¶
type AChLayer ¶ added in v1.1.0
type AChLayer interface { // GetACh returns the acetylcholine level for layer GetACh() float32 // SetACh sets the acetylcholine level for layer SetACh(ach float32) }
AChLayer is an interface for a layer with acetylcholine neuromodulator on it
type ClampAChLayer ¶ added in v1.1.0
type ClampAChLayer struct { leabra.Layer SendACh SendACh `desc:"list of layers to send acetylcholine to"` ACh float32 `desc:"acetylcholine value for this layer"` }
ClampAChLayer is an Input layer that just sends its activity as the acetylcholine signal
func (*ClampAChLayer) Build ¶ added in v1.1.0
func (ly *ClampAChLayer) Build() error
Build constructs the layer state, including calling Build on the projections.
func (*ClampAChLayer) CyclePost ¶ added in v1.1.0
func (ly *ClampAChLayer) CyclePost(ltime *leabra.Time)
CyclePost is called at end of Cycle We use it to send ACh, which will then be active for the next cycle of processing.
func (*ClampAChLayer) GetACh ¶ added in v1.1.0
func (ly *ClampAChLayer) GetACh() float32
func (*ClampAChLayer) SetACh ¶ added in v1.1.0
func (ly *ClampAChLayer) SetACh(ach float32)
type ClampDaLayer ¶
type ClampDaLayer struct { leabra.Layer SendDA SendDA `desc:"list of layers to send dopamine to"` DA float32 `desc:"dopamine value for this layer"` }
ClampDaLayer is an Input layer that just sends its activity as the dopamine signal
func AddClampDaLayer ¶
func AddClampDaLayer(nt *leabra.Network, name string) *ClampDaLayer
AddClampDaLayer adds a ClampDaLayer of given name
func (*ClampDaLayer) Build ¶
func (ly *ClampDaLayer) Build() error
Build constructs the layer state, including calling Build on the projections.
func (*ClampDaLayer) CyclePost ¶
func (ly *ClampDaLayer) CyclePost(ltime *leabra.Time)
CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.
func (*ClampDaLayer) Defaults ¶ added in v1.1.2
func (ly *ClampDaLayer) Defaults()
func (*ClampDaLayer) GetDA ¶
func (ly *ClampDaLayer) GetDA() float32
func (*ClampDaLayer) SetDA ¶
func (ly *ClampDaLayer) SetDA(da float32)
type DALayer ¶
type DALayer interface { // GetDA returns the dopamine level for layer GetDA() float32 // SetDA sets the dopamine level for layer SetDA(da float32) }
DALayer is an interface for a layer with dopamine neuromodulator on it
type RWDaLayer ¶
type RWDaLayer struct { leabra.Layer SendDA SendDA `desc:"list of layers to send dopamine to"` RewLay string `desc:"name of Reward-representing layer from which this computes DA -- if nothing clamped, no dopamine computed"` RWPredLay string `desc:"name of RWPredLayer layer that is subtracted from the reward value"` DA float32 `inactive:"+" desc:"dopamine value for this layer"` }
RWDaLayer computes a dopamine (DA) signal based on a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). It computes difference between r(t) and RWPred values. r(t) is accessed directly from a Rew layer -- if no external input then no DA is computed -- critical for effective use of RW only for PV cases. RWPred prediction is also accessed directly from Rew layer to avoid any issues.
func (*RWDaLayer) Build ¶
Build constructs the layer state, including calling Build on the projections.
func (*RWDaLayer) CyclePost ¶
CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.
type RWPredLayer ¶
type RWPredLayer struct { leabra.Layer PredRange minmax.F32 `` /* 180-byte string literal not displayed */ DA float32 `inactive:"+" desc:"dopamine value for this layer"` }
RWPredLayer computes reward prediction for a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). Activity is computed as linear function of excitatory conductance (which can be negative -- there are no constraints). Use with RWPrjn which does simple delta-rule learning on minus-plus.
func (*RWPredLayer) ActFmG ¶
func (ly *RWPredLayer) ActFmG(ltime *leabra.Time)
ActFmG computes linear activation for RWPred
func (*RWPredLayer) Defaults ¶
func (ly *RWPredLayer) Defaults()
func (*RWPredLayer) GetDA ¶
func (ly *RWPredLayer) GetDA() float32
func (*RWPredLayer) SetDA ¶
func (ly *RWPredLayer) SetDA(da float32)
type RWPrjn ¶
RWPrjn does dopamine-modulated learning for reward prediction: Da * Send.Act Use in RWPredLayer typically to generate reward predictions. Has no weight bounds or limits on sign etc.
type SendACh ¶ added in v1.1.0
SendACh is a list of layers to send acetylcholine to
func (*SendACh) AddAllBut ¶ added in v1.1.0
AddAllBut adds all layers in network except those in exlude list
func (*SendACh) AddOne ¶ added in v1.1.15
AddOne adds one layer name to list -- python version -- doesn't support varargs
type SendDA ¶
SendDA is a list of layers to send dopamine to
func (*SendDA) AddOne ¶ added in v1.1.15
AddOne adds one layer name to list -- python version -- doesn't support varargs
type TDDaLayer ¶
type TDDaLayer struct { leabra.Layer SendDA SendDA `desc:"list of layers to send dopamine to"` RewInteg string `desc:"name of TDRewIntegLayer from which this computes the temporal derivative"` DA float32 `desc:"dopamine value for this layer"` }
TDDaLayer computes a dopamine (DA) signal as the temporal difference (TD) between the TDRewIntegLayer activations in the minus and plus phase.
func (*TDDaLayer) Build ¶
Build constructs the layer state, including calling Build on the projections.
func (*TDDaLayer) CyclePost ¶
CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.
func (*TDDaLayer) RewIntegLayer ¶
func (ly *TDDaLayer) RewIntegLayer() (*TDRewIntegLayer, error)
type TDRewIntegLayer ¶
type TDRewIntegLayer struct { leabra.Layer RewInteg TDRewIntegParams `desc:"parameters for reward integration"` DA float32 `desc:"dopamine value for this layer"` }
TDRewIntegLayer is the temporal differences reward integration layer. It represents estimated value V(t) in the minus phase, and estimated V(t+1) + r(t) in the plus phase. It computes r(t) from (typically fixed) weights from a reward layer, and directly accesses values from RewPred layer.
func (*TDRewIntegLayer) ActFmG ¶
func (ly *TDRewIntegLayer) ActFmG(ltime *leabra.Time)
func (*TDRewIntegLayer) Build ¶
func (ly *TDRewIntegLayer) Build() error
Build constructs the layer state, including calling Build on the projections.
func (*TDRewIntegLayer) Defaults ¶
func (ly *TDRewIntegLayer) Defaults()
func (*TDRewIntegLayer) GetDA ¶
func (ly *TDRewIntegLayer) GetDA() float32
func (*TDRewIntegLayer) RewPredLayer ¶
func (ly *TDRewIntegLayer) RewPredLayer() (*TDRewPredLayer, error)
func (*TDRewIntegLayer) SetDA ¶
func (ly *TDRewIntegLayer) SetDA(da float32)
type TDRewIntegParams ¶
type TDRewIntegParams struct { Discount float32 `desc:"discount factor -- how much to discount the future prediction from RewPred"` RewPred string `desc:"name of TDRewPredLayer to get reward prediction from "` }
TDRewIntegParams are params for reward integrator layer
func (*TDRewIntegParams) Defaults ¶
func (tp *TDRewIntegParams) Defaults()
type TDRewPredLayer ¶
type TDRewPredLayer struct { leabra.Layer DA float32 `inactive:"+" desc:"dopamine value for this layer"` }
TDRewPredLayer is the temporal differences reward prediction layer. It represents estimated value V(t) in the minus phase, and computes estimated V(t+1) based on its learned weights in plus phase. Use TDRewPredPrjn for DA modulated learning.
func (*TDRewPredLayer) ActFmG ¶
func (ly *TDRewPredLayer) ActFmG(ltime *leabra.Time)
ActFmG computes linear activation for TDRewPred
func (*TDRewPredLayer) GetDA ¶
func (ly *TDRewPredLayer) GetDA() float32
func (*TDRewPredLayer) SetDA ¶
func (ly *TDRewPredLayer) SetDA(da float32)
type TDRewPredPrjn ¶
TDRewPredPrjn does dopamine-modulated learning for reward prediction: DWt = Da * Send.ActQ0 (activity on *previous* timestep) Use in TDRewPredLayer typically to generate reward predictions. Has no weight bounds or limits on sign etc.
func (*TDRewPredPrjn) DWt ¶
func (pj *TDRewPredPrjn) DWt()
DWt computes the weight change (learning) -- on sending projections.
func (*TDRewPredPrjn) Defaults ¶
func (pj *TDRewPredPrjn) Defaults()
func (*TDRewPredPrjn) WtFmDWt ¶
func (pj *TDRewPredPrjn) WtFmDWt()
WtFmDWt updates the synaptic weight values from delta-weight changes -- on sending projections