In reinforcement learning (RL), it is challenging to learn directly from high-dimensional observations, where data augmentation has recently been shown to remedy this via encoding invariances from raw pixels. Nevertheless, we empirically find that not all samples are equally important and hence simply injecting more augmented inputs may instead cause instability in Q-learning. In this paper, we approach this problem systematically by developing a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF), which can fully exploit sample importance and improve learning efficiency in a self-supervised manner. Facilitated by the proposed contrastive curiosity, CCLF is capable of prioritizing the experience replay, selecting the most informative augmented inputs, and more importantly regularizing the Q-function as well as the encoder to concentrate more on under-learned data. Moreover, it encourages the agent to explore with a curiosity-based reward. As a result, the agent can focus on more informative samples and learn representation invariances more efficiently, with significantly reduced augmented inputs. We apply CCLF to several base RL algorithms and evaluate on the DeepMind Control Suite, Atari, and MiniGrid benchmarks, where our approach demonstrates superior sample efficiency and learning performances compared with other state-of-the-art methods.
翻译:在强化学习(RL)中,直接从高层次观测中学习是具有挑战性的,因为最近通过从原始像素的编码变异性,展示了数据增强的数据,以纠正这一点。然而,我们从经验上发现,并非所有样本都同等重要,因此只是注入更多的投入,反而会造成Q学习的不稳定。在本文件中,我们系统地处理这一问题,方法是开发一个模型――不可知的对立-差异-差异-驱动学习框架(CCLF),该框架能够以自我监督的方式充分利用样本的重要性,提高学习效率。在拟议的对比性好奇心的帮助下,CCLF能够将经验重播、选择信息最丰富的投入以及更重要的是使Q功能和编码更加正规化,以便更多地集中于学习不足的数据。此外,它鼓励代理商以好奇性的奖励方式探索。结果是,该代理商可以侧重于信息性更强的样本,以更高效的方式学习差异性代表,而投入则大大减少。我们将CCLF应用于几个基础的 RL算法,并评估深晶控制套件、Atari、MiniG等测试方法展示了我们的高级性学习方法。