rl

package
v1.1.50 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 3, 2022 License: BSD-3-Clause Imports: 9 Imported by: 5

README

Reinforcement Learning and Dopamine

GoDoc

The rl package provides core infrastructure for dopamine neuromodulation and reinforcement learning, including the Rescorla-Wagner learning algorithm (RW) and Temporal Differences (TD) learning, and a minimal ClampDaLayer that can be used to send an arbitrary DA signal.

  • da.go defines a simple DALayer interface for getting and setting dopamine values, and a SendDA list of layer names that has convenience methods, and ability to send dopamine to any layer that implements the DALayer interface.

  • The RW and TD DA layers use the CyclePost layer-level method to send the DA to other layers, at end of each cycle, after activation is updated. Thus, DA lags by 1 cycle, which typically should not be a problem.

  • See the separate pvlv package for the full biologically-based pvlv model on top of this basic DA infrastructure.

Documentation

Overview

Package rl provides core infrastructure for dopamine neuromodulation and reinforcement learning, including the Rescorla-Wagner learning algorithm (RW) and Temporal Differences (TD) learning, and a minimal `ClampDaLayer` that can be used to send an arbitrary DA signal.

  • `da.go` defines a simple `DALayer` interface for getting and setting dopamine values, and a `SendDA` list of layer names that has convenience methods, and ability to send dopamine to any layer that implements the DALayer interface.
  • The RW and TD DA layers use the `CyclePost` layer-level method to send the DA to other layers, at end of each cycle, after activation is updated. Thus, DA lags by 1 cycle, which typically should not be a problem.
  • See the separate `pvlv` package for the full biologically-based pvlv model on top of this basic DA infrastructure.

Index

Constants

This section is empty.

Variables

View Source
var KiT_ClampAChLayer = kit.Types.AddType(&ClampAChLayer{}, leabra.LayerProps)
View Source
var KiT_ClampDaLayer = kit.Types.AddType(&ClampDaLayer{}, leabra.LayerProps)
View Source
var KiT_RWDaLayer = kit.Types.AddType(&RWDaLayer{}, deep.LayerProps)
View Source
var KiT_RWPredLayer = kit.Types.AddType(&RWPredLayer{}, leabra.LayerProps)
View Source
var KiT_RWPrjn = kit.Types.AddType(&RWPrjn{}, deep.PrjnProps)
View Source
var KiT_TDDaLayer = kit.Types.AddType(&TDDaLayer{}, leabra.LayerProps)
View Source
var KiT_TDRewIntegLayer = kit.Types.AddType(&TDRewIntegLayer{}, leabra.LayerProps)
View Source
var KiT_TDRewPredLayer = kit.Types.AddType(&TDRewPredLayer{}, leabra.LayerProps)
View Source
var KiT_TDRewPredPrjn = kit.Types.AddType(&TDRewPredPrjn{}, deep.PrjnProps)

Functions

func AddRWLayers

func AddRWLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, da leabra.LeabraLayer)

AddRWLayers adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero.

func AddRWLayersPy added in v1.1.15

func AddRWLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer

AddRWLayersPy adds simple Rescorla-Wagner (PV only) dopamine system, with a primary Reward layer, a RWPred prediction layer, and a dopamine layer that computes diff. Only generates DA when Rew layer has external input -- otherwise zero. Py is Python version, returns layers as a slice

func AddTDLayers

func AddTDLayers(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) (rew, rp, ri, td leabra.LeabraLayer)

AddTDLayers adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight.

func AddTDLayersPy added in v1.1.15

func AddTDLayersPy(nt *leabra.Network, prefix string, rel relpos.Relations, space float32) []leabra.LeabraLayer

AddTDLayersPy adds the standard TD temporal differences layers, generating a DA signal. Projection from Rew to RewInteg is given class TDRewToInteg -- should have no learning and 1 weight. Py is Python version, returns layers as a slice

Types

type AChLayer added in v1.1.0

type AChLayer interface {
	// GetACh returns the acetylcholine level for layer
	GetACh() float32

	// SetACh sets the acetylcholine level for layer
	SetACh(ach float32)
}

AChLayer is an interface for a layer with acetylcholine neuromodulator on it

type ClampAChLayer added in v1.1.0

type ClampAChLayer struct {
	leabra.Layer
	SendACh SendACh `desc:"list of layers to send acetylcholine to"`
	ACh     float32 `desc:"acetylcholine value for this layer"`
}

ClampAChLayer is an Input layer that just sends its activity as the acetylcholine signal

func (*ClampAChLayer) Build added in v1.1.0

func (ly *ClampAChLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*ClampAChLayer) CyclePost added in v1.1.0

func (ly *ClampAChLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send ACh, which will then be active for the next cycle of processing.

func (*ClampAChLayer) GetACh added in v1.1.0

func (ly *ClampAChLayer) GetACh() float32

func (*ClampAChLayer) SetACh added in v1.1.0

func (ly *ClampAChLayer) SetACh(ach float32)

type ClampDaLayer

type ClampDaLayer struct {
	leabra.Layer
	SendDA SendDA  `desc:"list of layers to send dopamine to"`
	DA     float32 `desc:"dopamine value for this layer"`
}

ClampDaLayer is an Input layer that just sends its activity as the dopamine signal

func AddClampDaLayer

func AddClampDaLayer(nt *leabra.Network, name string) *ClampDaLayer

AddClampDaLayer adds a ClampDaLayer of given name

func (*ClampDaLayer) Build

func (ly *ClampDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*ClampDaLayer) CyclePost

func (ly *ClampDaLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*ClampDaLayer) Defaults added in v1.1.2

func (ly *ClampDaLayer) Defaults()

func (*ClampDaLayer) GetDA

func (ly *ClampDaLayer) GetDA() float32

func (*ClampDaLayer) SetDA

func (ly *ClampDaLayer) SetDA(da float32)

type DALayer

type DALayer interface {
	// GetDA returns the dopamine level for layer
	GetDA() float32

	// SetDA sets the dopamine level for layer
	SetDA(da float32)
}

DALayer is an interface for a layer with dopamine neuromodulator on it

type RWDaLayer

type RWDaLayer struct {
	leabra.Layer
	SendDA    SendDA  `desc:"list of layers to send dopamine to"`
	RewLay    string  `desc:"name of Reward-representing layer from which this computes DA -- if nothing clamped, no dopamine computed"`
	RWPredLay string  `desc:"name of RWPredLayer layer that is subtracted from the reward value"`
	DA        float32 `inactive:"+" desc:"dopamine value for this layer"`
}

RWDaLayer computes a dopamine (DA) signal based on a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). It computes difference between r(t) and RWPred values. r(t) is accessed directly from a Rew layer -- if no external input then no DA is computed -- critical for effective use of RW only for PV cases. RWPred prediction is also accessed directly from Rew layer to avoid any issues.

func (*RWDaLayer) ActFmG

func (ly *RWDaLayer) ActFmG(ltime *leabra.Time)

func (*RWDaLayer) Build

func (ly *RWDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*RWDaLayer) CyclePost

func (ly *RWDaLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*RWDaLayer) Defaults

func (ly *RWDaLayer) Defaults()

func (*RWDaLayer) GetDA

func (ly *RWDaLayer) GetDA() float32

func (*RWDaLayer) RWLayers

func (ly *RWDaLayer) RWLayers() (*leabra.Layer, *RWPredLayer, error)

RWLayers returns the reward and RWPred layers based on names

func (*RWDaLayer) SetDA

func (ly *RWDaLayer) SetDA(da float32)

type RWPredLayer

type RWPredLayer struct {
	leabra.Layer
	PredRange minmax.F32 `` /* 180-byte string literal not displayed */
	DA        float32    `inactive:"+" desc:"dopamine value for this layer"`
}

RWPredLayer computes reward prediction for a simple Rescorla-Wagner learning dynamic (i.e., PV learning in the PVLV framework). Activity is computed as linear function of excitatory conductance (which can be negative -- there are no constraints). Use with RWPrjn which does simple delta-rule learning on minus-plus.

func (*RWPredLayer) ActFmG

func (ly *RWPredLayer) ActFmG(ltime *leabra.Time)

ActFmG computes linear activation for RWPred

func (*RWPredLayer) Defaults

func (ly *RWPredLayer) Defaults()

func (*RWPredLayer) GetDA

func (ly *RWPredLayer) GetDA() float32

func (*RWPredLayer) SetDA

func (ly *RWPredLayer) SetDA(da float32)

type RWPrjn

type RWPrjn struct {
	leabra.Prjn
	DaTol float32 `` /* 208-byte string literal not displayed */
}

RWPrjn does dopamine-modulated learning for reward prediction: Da * Send.Act Use in RWPredLayer typically to generate reward predictions. Has no weight bounds or limits on sign etc.

func (*RWPrjn) DWt

func (pj *RWPrjn) DWt()

DWt computes the weight change (learning) -- on sending projections.

func (*RWPrjn) Defaults

func (pj *RWPrjn) Defaults()

func (*RWPrjn) WtFmDWt

func (pj *RWPrjn) WtFmDWt()

WtFmDWt updates the synaptic weight values from delta-weight changes -- on sending projections

type SendACh added in v1.1.0

type SendACh emer.LayNames

SendACh is a list of layers to send acetylcholine to

func (*SendACh) Add added in v1.1.0

func (sd *SendACh) Add(laynm ...string)

Add adds given layer name(s) to list

func (*SendACh) AddAllBut added in v1.1.0

func (sd *SendACh) AddAllBut(net emer.Network, excl []string)

AddAllBut adds all layers in network except those in exlude list

func (*SendACh) AddOne added in v1.1.15

func (sd *SendACh) AddOne(laynm string)

AddOne adds one layer name to list -- python version -- doesn't support varargs

func (*SendACh) SendACh added in v1.1.0

func (sd *SendACh) SendACh(net emer.Network, ach float32)

SendACh sends acetylcholine to list of layers

func (*SendACh) Validate added in v1.1.0

func (sd *SendACh) Validate(net emer.Network, ctxt string) error

Validate ensures that LayNames layers are valid. ctxt is string for error message to provide context.

type SendDA

type SendDA emer.LayNames

SendDA is a list of layers to send dopamine to

func (*SendDA) Add

func (sd *SendDA) Add(laynm ...string)

Add adds given layer name(s) to list

func (*SendDA) AddAllBut

func (sd *SendDA) AddAllBut(net emer.Network, excl []string)

AddAllBut adds all layers in network except those in exlude list

func (*SendDA) AddOne added in v1.1.15

func (sd *SendDA) AddOne(laynm string)

AddOne adds one layer name to list -- python version -- doesn't support varargs

func (*SendDA) SendDA

func (sd *SendDA) SendDA(net emer.Network, da float32)

SendDA sends dopamine to list of layers

func (*SendDA) Validate

func (sd *SendDA) Validate(net emer.Network, ctxt string) error

Validate ensures that LayNames layers are valid. ctxt is string for error message to provide context.

type TDDaLayer

type TDDaLayer struct {
	leabra.Layer
	SendDA   SendDA  `desc:"list of layers to send dopamine to"`
	RewInteg string  `desc:"name of TDRewIntegLayer from which this computes the temporal derivative"`
	DA       float32 `desc:"dopamine value for this layer"`
}

TDDaLayer computes a dopamine (DA) signal as the temporal difference (TD) between the TDRewIntegLayer activations in the minus and plus phase.

func (*TDDaLayer) ActFmG

func (ly *TDDaLayer) ActFmG(ltime *leabra.Time)

func (*TDDaLayer) Build

func (ly *TDDaLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*TDDaLayer) CyclePost

func (ly *TDDaLayer) CyclePost(ltime *leabra.Time)

CyclePost is called at end of Cycle We use it to send DA, which will then be active for the next cycle of processing.

func (*TDDaLayer) Defaults

func (ly *TDDaLayer) Defaults()

func (*TDDaLayer) GetDA

func (ly *TDDaLayer) GetDA() float32

func (*TDDaLayer) RewIntegLayer

func (ly *TDDaLayer) RewIntegLayer() (*TDRewIntegLayer, error)

func (*TDDaLayer) SetDA

func (ly *TDDaLayer) SetDA(da float32)

type TDRewIntegLayer

type TDRewIntegLayer struct {
	leabra.Layer
	RewInteg TDRewIntegParams `desc:"parameters for reward integration"`
	DA       float32          `desc:"dopamine value for this layer"`
}

TDRewIntegLayer is the temporal differences reward integration layer. It represents estimated value V(t) in the minus phase, and estimated V(t+1) + r(t) in the plus phase. It computes r(t) from (typically fixed) weights from a reward layer, and directly accesses values from RewPred layer.

func (*TDRewIntegLayer) ActFmG

func (ly *TDRewIntegLayer) ActFmG(ltime *leabra.Time)

func (*TDRewIntegLayer) Build

func (ly *TDRewIntegLayer) Build() error

Build constructs the layer state, including calling Build on the projections.

func (*TDRewIntegLayer) Defaults

func (ly *TDRewIntegLayer) Defaults()

func (*TDRewIntegLayer) GetDA

func (ly *TDRewIntegLayer) GetDA() float32

func (*TDRewIntegLayer) RewPredLayer

func (ly *TDRewIntegLayer) RewPredLayer() (*TDRewPredLayer, error)

func (*TDRewIntegLayer) SetDA

func (ly *TDRewIntegLayer) SetDA(da float32)

type TDRewIntegParams

type TDRewIntegParams struct {
	Discount float32 `desc:"discount factor -- how much to discount the future prediction from RewPred"`
	RewPred  string  `desc:"name of TDRewPredLayer to get reward prediction from "`
}

TDRewIntegParams are params for reward integrator layer

func (*TDRewIntegParams) Defaults

func (tp *TDRewIntegParams) Defaults()

type TDRewPredLayer

type TDRewPredLayer struct {
	leabra.Layer
	DA float32 `inactive:"+" desc:"dopamine value for this layer"`
}

TDRewPredLayer is the temporal differences reward prediction layer. It represents estimated value V(t) in the minus phase, and computes estimated V(t+1) based on its learned weights in plus phase. Use TDRewPredPrjn for DA modulated learning.

func (*TDRewPredLayer) ActFmG

func (ly *TDRewPredLayer) ActFmG(ltime *leabra.Time)

ActFmG computes linear activation for TDRewPred

func (*TDRewPredLayer) GetDA

func (ly *TDRewPredLayer) GetDA() float32

func (*TDRewPredLayer) SetDA

func (ly *TDRewPredLayer) SetDA(da float32)

type TDRewPredPrjn

type TDRewPredPrjn struct {
	leabra.Prjn
}

TDRewPredPrjn does dopamine-modulated learning for reward prediction: DWt = Da * Send.ActQ0 (activity on *previous* timestep) Use in TDRewPredLayer typically to generate reward predictions. Has no weight bounds or limits on sign etc.

func (*TDRewPredPrjn) DWt

func (pj *TDRewPredPrjn) DWt()

DWt computes the weight change (learning) -- on sending projections.

func (*TDRewPredPrjn) Defaults

func (pj *TDRewPredPrjn) Defaults()

func (*TDRewPredPrjn) WtFmDWt

func (pj *TDRewPredPrjn) WtFmDWt()

WtFmDWt updates the synaptic weight values from delta-weight changes -- on sending projections

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL