We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1.9x and 1.6x performance gains at the 100K environment and interaction steps benchmarks respectively. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency and performance of methods that use state-based features.
翻译:我们介绍了CURL: 强化学习的相互抵触的无人监督的演示。 CURL通过对比性学习从原始像素中提取高层次的特征,并在提取的特征上进行非政策控制。 CURL在深度控制套件和Atari运动会的复杂任务方面,超越了以前基于模型和无模型的像素方法,这些方法分别显示了100K环境和互动步骤基准的1.9x和1.6x绩效收益。 在深层控制套件上, CURL是第一个基于图像的算法,几乎与使用基于状态特征的方法的抽样效率和性能相匹配。