An important open question in computational neuroscience is how various spatially tuned neurons, such as place cells, are used to support the learning of reward-seeking behavior of an animal. Existing computational models either lack biological plausibility or fall short of behavioral flexibility when environments change. In this paper, we propose a computational theory that achieves behavioral flexibility with better biological plausibility. We first train a mixture of Gaussian distributions to model the ensemble of firing fields of place cells. Then we propose a Hebbian-like rule to learn the synaptic strength matrix among place cells. This matrix is interpreted as the transition rate matrix of a continuous time Markov chain to generate the sequential replay of place cells. During replay, the synaptic strengths from place cells to medium spiny neurons (MSN) are learned by a temporal-difference like rule to store place-reward associations. After replay, the activation of MSN will ramp up when an animal approaches the rewarding place, so the animal can move along the direction where the MSN activation is increasing to find the rewarding place. We implement our theory into a high-fidelity virtual rat in the MuJoCo physics simulator. In a complex maze, the rat shows significantly better learning efficiency and behavioral flexibility than a rat that implements a neuroscience-inspired reinforcement learning algorithm, deep Q-network.
翻译:在计算神经科学中,一个重要的开放问题是,如何使用各种空间调频神经神经,例如位置细胞,支持学习动物追求奖励的行为。现有的计算模型要么缺乏生物常识,要么在环境变化时缺少行为灵活性。在本文中,我们提出了一个计算理论,以更好的生物常识实现行为灵活性。我们首先将高山分布的混合体培养成地方细胞发射场的混合体。然后我们提出一种类似赫比亚的规则,以学习各地点细胞间合成强度矩阵。这个矩阵被解释为连续时间马可夫链的过渡率矩阵,以生成连续播放各个地点细胞的重现。在重播期间,我们从各地点细胞到中等脊椎神经神经(MSN)的合成力量通过时间偏差学来学习更具有生物常识的协会。在重放后,当动物接近奖励地点时,MSNSN就会被激活,因此动物可以沿着MSN激活的方向前进,在这个方向上,即恒定时,Mark链链链链链链链将不断延展,以便产生连续的重现。在重现的时,我们在重塑的体物理学中,我们将一个高科技变的理论的理论中,我们将一个更深的学习了一种高的理论。我们学习了一种高级的原理。