Humans and animals can learn new skills after practicing for a few hours, while current reinforcement learning algorithms require a large amount of data to achieve good performances. Recent model-based approaches show promising results by reducing the number of necessary interactions with the environment to learn a desirable policy. However, these methods require biological implausible ingredients, such as the detailed storage of older experiences, and long periods of offline learning. The optimal way to learn and exploit word-models is still an open question. Taking inspiration from biology, we suggest that dreaming might be an efficient expedient to use an inner model. We propose a two-module (agent and model) neural network in which "dreaming" (living new experiences in a model-based simulated environment) significantly boosts learning. We also explore "planning", an online alternative to dreaming, that shows comparable performances. Importantly, our model does not require the detailed storage of experiences, and learns online the world-model. This is a key ingredient for biological plausibility and implementability (e.g., in neuromorphic hardware). Our network is composed of spiking neurons, further increasing the energetic efficiency and the plausibility of the model. To our knowledge, there are no previous works proposing biologically plausible model-based reinforcement learning in recurrent spiking networks. Our work is a step toward building efficient neuromorphic systems for autonomous robots, capable of learning new skills in real-world environments. Even when the environment is no longer accessible, the robot optimizes learning by "reasoning" in its own "mind". These approaches are of great relevance when the acquisition from the environment is slow, expensive (robotics) or unsafe (autonomous driving).
翻译:人类和动物在练习几小时后可以学习新技能, 而当前的强化学习算法需要大量数据才能取得良好的业绩。 最近基于模型的方法通过减少与环境的必要互动以学习理想的政策,显示了有希望的成果。 但是,这些方法需要的是生物学上难以置信的成分, 如详细储存老的经验, 以及长时间的离线学习。 学习和利用字型的最佳方式仍然是一个开放的问题。 从生物学的灵感看, 我们建议做梦可能是使用内部模型的有效方便。 我们建议建立一个双模( 试机和模型) 神经神经网络, 以“ 改变” (在模型模拟环境中生活的新经验) 大大促进学习。 我们还探索“ 规划 ”, 在线的替代梦想, 显示相似的绩效。 重要的是, 我们的模型不需要详细储存经验, 并在网上学习。 这是生物常态模型的可理解性和可执行性( 例如, 神经变形硬件) 的关键成分。 我们的网络是由神经系统构建的模型, “ 加速神经学方法, 进一步增强我们以往的常态环境。