As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.
翻译:由于人造和机器人系统日益得到部署,并被依赖于现实世界应用,重要的是,它们要表现出在动态变化的环境中不断学习和适应的能力,成为终身学习机器。 连续/终身学习(LL)意味着尽量减少灾难性地忘记旧任务,同时最大限度地发挥模型学习新任务的能力。本文件论述具有挑战性的终身强化学习(L2RL)设置;将L2RL的先进工艺推进到L2RL,使L2RL对实际应用有用,不仅需要开发单个的L2RL算法;这需要在系统一级取得进展,特别是研究如何将多种L2RL算法纳入共同框架的非三重问题。在本文件中,我们介绍终身强化学习构成框架(L2RLCF),该框架使L2RL系统标准化,并将不同的持续学习组成部分(每个解决终身学习问题的不同方面)纳入一个统一的系统。作为L2RLCF的即时序评估,我们开发一个标准APA,便于将多种终身学习的组合组合纳入新的系统。我们还介绍了一个已实现的系统。