Despite extensive theoretical work on biologically plausible learning rules, clear evidence about whether and how such rules are implemented in the brain has been difficult to obtain. We consider biologically plausible supervised- and reinforcement-learning rules and ask whether changes in network activity during learning can be used to determine which learning rule is being used. Supervised learning requires a credit-assignment model estimating the mapping from neural activity to behavior, and, in a biological organism, this model will inevitably be an imperfect approximation of the ideal mapping, leading to a bias in the direction of the weight updates relative to the true gradient. Reinforcement learning, on the other hand, requires no credit-assignment model and tends to make weight updates following the true gradient direction. We derive a metric to distinguish between learning rules by observing changes in the network activity during learning, given that the mapping from brain to behavior is known by the experimenter. Because brain-machine interface (BMI) experiments allow for precise knowledge of this mapping, we model a cursor-control BMI task using recurrent neural networks, showing that learning rules can be distinguished in simulated experiments using only observations that a neuroscience experimenter would plausibly have access to.
翻译:尽管在生物上可信的学习规则方面进行了广泛的理论工作,但很难获得关于是否以及如何在大脑中执行这些规则的明确证据。我们认为在生物学上可信的监督和强化学习规则,并询问在学习期间网络活动的变化是否可用于确定使用哪种学习规则。 受监督的学习需要一种信用分配模型,从神经活动到行为来估计绘图,而在生物机体中,这一模型必然是对理想绘图的不完美的近似,导致对相对于真实梯度的重量更新方向的偏差。 另一方面,加强学习不需要信用分配模式,而且倾向于按照真正的梯度方向进行加权更新。我们开发了一种指标,通过观察学习过程中网络活动的变化来区分学习规则,因为实验者知道从大脑到行为上的绘图。由于脑-机器接口(BMI)实验能够精确了解这一绘图,我们用经常性的神经网络来模拟光源控制 BMI任务,表明在模拟实验中可以区分学习规则,只有观测到神经科学实验者可以令人信服地进入。