While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
翻译:虽然深层强化学习(RL)在机器学习方面催生了众多引人注目的成功,但是由于广泛采用这种学习方法,其数据效率往往很差,而且其所产生政策的普遍性有限,因此受到阻碍。一个有希望的减轻这些限制的方法是,将更好的RL算法发展成一个机器学习问题本身,在称为Met-RL的进程中,这是一种机器学习问题本身。Meta-RL最常在一种问题环境下进行研究,因为考虑到任务的分配,我们的目标是学习一种能够适应任务分配所产生的任何新任务的政策,尽可能少有数据。我们在这次调查中详细描述了元-RL问题设置及其主要变异。我们讨论如何在高层次上根据任务分配和每项任务可用的学习预算进行元-RL研究。我们利用这些组,然后调查元-RL的算法和应用。我们最后通过介绍在为深层RL执业者提供标准工具箱的元-RL部分的道路上的公开问题。