Standard models of biologically realistic, or inspired, reinforcement learning employ a global error signal which implies shallow networks. However, deep networks could offer a drastically superior performance by feeding the error signal backwards through such a network which in turn is not biologically realistic as it requires symmetric weights between top-down and bottom-up pathways. Instead, we present a network combining local learning with global modulation where neuromodulation controls the amount of plasticity change in the whole network, while only the sign of the error is backpropagated through the network. The neuromodulation can be understood as a rectified error, or relevance, signal while the bottom-up sign of the error signal decides between long-term potentiation and long-term depression. We demonstrate the performance of this paradigm with a real robotic task.
翻译:生物现实的或受启发的强化学习标准模型采用全球错误信号,这意味着浅网络。然而,深层网络可以通过将错误信号反馈到这样的网络中,提供极优的性能,而错误信号反过来又在生物学上是不现实的,因为它需要自上而下和自下而上路径之间的对称权重。相反,我们展示了一个将本地学习与全球调制相结合的网络,其中神经调节控制着整个网络的可塑性变化量,而只有错误信号通过网络反射。神经调节可以被理解为纠正错误或相关性信号,而错误信号的自下而上信号则决定长期电力和长期压抑之间。我们用真正的机器人任务展示了这种模式的性能。