Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning. Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge. While considerable research has focused on learning multiple input distributions, typically in classification, lifelong reinforcement learning (LRL) must also deal with variations in the state and transition distributions, and in the reward functions. Modulating masks, recently developed for classification, are particularly suitable to deal with such a large spectrum of task variations. In this paper, we adapted modulating masks to work with deep LRL, specifically PPO and IMPALA agents. The comparison with LRL baselines in both discrete and continuous RL tasks shows superior performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.
翻译:终身学习的目的是建立与生物学习相似的终身持续和渐进学习的人工智能系统。迄今为止,各种尝试都遇到了问题,包括灾难性的遗忘、任务之间的干扰以及无法利用先前的知识。虽然大量研究侧重于学习多种投入分布,通常在分类方面,终身强化学习(LLL)也必须处理状态和过渡分配以及奖励功能的差异。最近为分类而开发的模范面罩特别适合处理如此之多的任务变异。在本文中,我们调整了面具以适应深长LLLL(特别是PPPO和IMAPLA)代理人的工作。在离散和连续的RL任务中与L(L)基线的比较显示了优异性表现。我们进一步研究了在学习新任务时使用先前学到的知识的线性组合:不仅学习得更快,算法解决了我们无法以其他方式解决的任务,因为从刮痕到极其微薄的报酬。结果显示,带调制模面罩的RL(RL)是一种充满希望的方法,可以使知识的构成变得日益复杂的任务,知识再利用知识以高效和更快地学习。