The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We rely on a unifying formalism and terminology for discussing different ZSG problems, building upon previous works. We go on to categorise existing benchmarks for ZSG, as well as current methods for tackling these problems. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG, and we recommend building benchmarks in underexplored problem settings such as offline RL ZSG and reward-function variation.
翻译:在深入加强学习中零点概括(ZSG)的研究旨在产生RL算法,其政策在部署时非常概括于新奇的无形情况,避免过度适应其培训环境。如果我们要在现实世界情景中部署强化学习算法(环境将是多样化、动态和不可预测的),那么解决这一问题至关重要。这项调查是对这一新生领域的概览。我们依靠一种统一的形式主义和术语来讨论不同的ZSG问题,在以往工作的基础上更进一步。我们接着将ZSG的现有基准和解决这些问题的现有方法分类。最后,我们对实地现状进行批判性讨论,包括对未来工作的建议。除其他结论外,我们认为,采用纯粹程序内容生成方法来制定基准不利于ZSG的进展,我们建议快速在线适应和解决与RL有关的特定问题,作为今后有关ZSG方法工作的一些领域。我们建议,在未得到充分探讨的问题环境中建立基准,例如离线的RL ZSG和奖励性功能变化。