Navigation to multiple cued reward locations has been increasingly used to study rodent learning. Though deep reinforcement learning agents have been shown to be able to learn the task, they are not biologically plausible. Biologically plausible classic actor-critic agents have been shown to learn to navigate to single reward locations, but which biologically plausible agents are able to learn multiple cue-reward location tasks has remained unclear. In this computational study, we show versions of classic agents that learn to navigate to a single reward location, and adapt to reward location displacement, but are not able to learn multiple paired association navigation. The limitation is overcome by an agent in which place cell and cue information are first processed by a feedforward nonlinear hidden layer with synapses to the actor and critic subject to temporal difference error-modulated plasticity. Faster learning is obtained when the feedforward layer is replaced by a recurrent reservoir network.
翻译:越来越多的人利用多种悬赏地点的导航来学习鼠类学习。 虽然深层强化学习剂已经证明能够学习这一任务, 但它们在生物学上并不可信。 生物上可信的经典演员- 化学剂已经证明可以学习导航到单一的奖励地点, 但生物学上可行的代理物能够学习到多个提示- 奖励地点的任务仍然不清楚。 在本次计算研究中, 我们展示了传统剂的版本, 即学会导航到单一的奖励地点, 并适应于奖励地点的迁移, 但无法学习多个配对关联导航。 限制被一个代理物所克服, 在那里, 单元格和提示信息首先通过向演员和评论家提供有时间差异误差的微积分的非线性隐形材料处理。 当一个循环的储油库网络取代向前的饲料层时, 获得更快的学习。