Despite extensive theoretical work on biologically plausible learning rules, it has been difficult to obtain clear evidence about whether and how such rules are implemented in the brain. We consider biologically plausible supervised- and reinforcement-learning rules and ask whether changes in network activity during learning can be used to determine which learning rule is being used. Supervised learning requires a credit-assignment model estimating the mapping from neural activity to behavior, and, in a biological organism, this model will inevitably be an imperfect approximation of the ideal mapping, leading to a bias in the direction of the weight updates relative to the true gradient. Reinforcement learning, on the other hand, requires no credit-assignment model and tends to make weight updates following the true gradient direction. We derive a metric to distinguish between learning rules by observing changes in the network activity during learning, given that the mapping from brain to behavior is known by the experimenter. Because brain-machine interface (BMI) experiments allow for perfect knowledge of this mapping, we focus on modeling a cursor-control BMI task using recurrent neural networks, showing that learning rules can be distinguished in simulated experiments using only observations that a neuroscience experimenter would plausibly have access to.
翻译:尽管在生物上可信的学习规则方面进行了广泛的理论工作,但很难获得关于是否和如何在大脑中执行这些规则的明确证据。我们认为,从生物学上看,监督和强化学习规则是可信的,并询问在学习期间网络活动的变化是否可用于确定使用哪种学习规则。 监督学习需要一种信用分配模型,从神经活动到行为来估计绘图,而在生物生物机体中,这种模型必然是理想绘图的不完美的近似,导致在相对于真实梯度的重量更新方向上出现偏差。另一方面,加强学习不需要信用分配模型,而且倾向于根据真正的梯度方向进行加权更新。我们开发了一种指标,通过观察学习过程中网络活动的变化来区分学习规则,因为实验者知道从大脑到行为上的绘图。由于脑-机器接口(BMI)实验能够使这种绘图得到完美的了解,我们侧重于利用经常性神经网络模拟光控BMI任务,表明在模拟实验中学习规则可以区分,只有观察神经科学实验者才能令人信服地进入。