Continual learning (CL) is a branch of machine learning that aims to enable agents to adapt and generalise previously learned abilities so that these can be reapplied to new tasks or environments. This is particularly useful in multi-task settings or in non-stationary environments, where the dynamics can change over time. This is particularly relevant in cyber-physical systems such as autonomous driving. However, despite recent advances in CL, successfully applying it to reinforcement learning (RL) is still an open problem. This paper highlights open challenges in continual RL (CRL) based on experiments in an autonomous driving environment. In this environment, the agent must learn to successfully park in four different scenarios corresponding to parking spaces oriented at varying angles. The agent is successively trained in these four scenarios one after another, representing a CL environment, using Proximal Policy Optimisation (PPO). These experiments exposed a number of open challenges in CRL: finding suitable abstractions of the environment, oversensitivity to hyperparameters, catastrophic forgetting, and efficient use of neural network capacity. Based on these identified challenges, we present open research questions that are important to be addressed for creating robust CRL systems. In addition, the identified challenges call into question the suitability of neural networks for CL. We also identify the need for interdisciplinary research, in particular between computer science and neuroscience.
翻译:持续学习(CL)是机器学习的一个分支,旨在使智能体能够适应并泛化先前习得的能力,从而将其重新应用于新任务或新环境。这在多任务设置或非平稳环境中尤为有用,因为此类环境的动态特性可能随时间变化。这一特性在自动驾驶等信息物理系统中具有特殊相关性。然而,尽管持续学习领域近期取得了进展,但将其成功应用于强化学习(RL)仍是一个待解决的问题。本文基于自动驾驶环境中的实验,重点探讨了持续强化学习(CRL)面临的开放挑战。在该环境中,智能体需要学习在四种不同场景下成功停车,这些场景对应着不同角度的停车位。通过使用近端策略优化(PPO)算法,智能体在这四种场景中依次进行连续训练,从而构建了一个持续学习环境。这些实验揭示了持续强化学习面临的若干开放挑战:寻找合适的环境抽象表示、对超参数过度敏感、灾难性遗忘问题以及神经网络容量的高效利用。基于这些已识别的挑战,我们提出了对构建鲁棒的持续强化学习系统至关重要的开放研究问题。此外,这些挑战也引发了对神经网络是否适用于持续学习的质疑。我们还指出需要开展跨学科研究,特别是计算机科学与神经科学之间的交叉研究。