We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to regularize the value function. Existing model-free approaches, such as Soft Actor-Critic (SAC), are not able to train deep networks effectively from image pixels. However, the addition of our augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based (Dreamer, PlaNet, and SLAC) methods and recently proposed contrastive learning (CURL). Our approach can be combined with any model-free reinforcement learning algorithm, requiring only minor modifications. An implementation can be found at https://sites.google.com/view/data-regularized-q.
翻译:我们建议了一种简单的数据增强技术,可以应用于标准的无模型强化学习算法,使像素能够直接进行有力的学习,而不需要附带损失或培训前。该方法利用计算机愿景任务中常用的输入扰动来规范价值功能。现有的无模型方法,如Soft Acor-Critic(SAC),无法有效地从图像像素中培训深层网络。然而,我们增强方法的添加极大地改善了SAC的性能,使其能够在深海控制套件上达到最新水平的性能,超过基于模型的方法(Dreamer、PlaNet和SLAC)和最近提议的对比性学习方法。我们的方法可以与任何无模型强化学习算法相结合,只需要稍作修改。可在https://sites.gogle.com/view/data-正规化-q找到实施方法。