We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1.9x and 1.2x performance gains at the 100K environment and interaction steps benchmarks respectively. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features. Our code is open-sourced and available at https://github.com/MishaLaskin/curl.
翻译:我们介绍CURL: 强化学习的对比性、 无人监督的演示。 CURL通过对比性学习从原始像素中提取高层次的特征,并在提取的特征之上进行非政策控制。 CURL在深明控制套件和Atari运动会的复杂任务上,在深明控制套件和Atari运动会中,对分别显示100K环境和互动步骤基准1.9x和1.2x业绩收益的复杂任务,比以前基于模型和无模型的像素方法都优异。 在深明控制套件中, CURL是第一种基于图像的算法,几乎与使用基于国家特征的方法的样本效率相匹配。 我们的代码是开放的,可在https://github.com/ MishaLaskin/curl上查阅。