Standard models of biologically realistic, or inspired, reinforcement learning employ a global error signal which implies shallow networks. However, on the other hand, local learning rules allow networks with multiple layers. Here, we present a network combining local learning with global modulation where neuromodulation controls the amount of plasticity change in the whole network, while the sign of the error is passed via a bottom-up pathway through the network. Neuromodulation can be understood as a rectified error, or relevance, signal while the bottom-up sign of the error signal decides between long-term potentiation and long-term depression. We demonstrate the performance of this paradigm with a real robotic task as a proof of concept.
翻译:生物上现实的强化学习标准模型或受启发的强化学习标准模型采用全球错误信号,这意味着浅网络。但另一方面,地方学习规则允许多层网络。在这里,我们展示了一个将本地学习与全球调制相结合的网络,其中神经调制控制了整个网络的可塑性变化量,而错误的标志则通过网络的自下而上的路径传递。神经调制可以被理解为一个纠正错误或相关性的信号,而错误的自下而上的信号则决定长期强力和长期压抑之间的信号。我们用真正的机器人任务来证明概念。