增加数据强化学习 (Reinforcement Learning with Augmented Data)

from arxiv, NeurIPS 2020 camera-ready version. First two authors contributed equally, website: https://mishalaskin.github.io/rad code: https://github.com/MishaLaskin/rad and https://github.com/pokaxpoka/rad_procgen

Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations - random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the DeepMind Control Suite benchmark for pixel-based control as well as OpenAI Gym benchmark for state-based control. We further demonstrate that RAD significantly improves test-time generalization over existing methods on several OpenAI ProcGen benchmarks. Our RAD module and training code are available at https://www.github.com/MishaLaskin/rad.

翻译：从视觉观测中学习,这是加强学习(RL)中一个根本性但具有挑战性的问题。虽然算法进步与进化神经网络相结合,已证明是成功的一个秘诀,但目前的方法仍然缺乏两个方面:(a) 学习的数据效率,和(b) 向新环境的概括化。为此,我们介绍一个简单的插头和游戏模块,即强化数据(RAD),它可以加强大多数RL算法。我们首次广泛研究RL在像素和州基投入方面的一般性数据增强情况,并引入两个新的数据增强(随机翻译和随机振荡尺度)。我们显示,诸如随机翻译、作物、彩色吉特、补丁、随机变异、随机变异和振荡尺度化等增强方法,可以使简单的RL算法能够超越复杂的增强数据数据(RAD)方法,一个简单的插件模块可以加强大多数RL的算法。RAD在数据效率和D调控基准方面的新状态和最后性表现。我们进一步采用基于等基控制的基准,作为OTIA Gym(PRA)的通用标准。