We investigate whether self-supervised learning (SSL) can improve online reinforcement learning (RL) from pixels. We extend the contrastive reinforcement learning framework (e.g., CURL) that jointly optimizes SSL and RL losses and conduct an extensive amount of experiments with various self-supervised losses. Our observations suggest that the existing SSL framework for RL fails to bring meaningful improvement over the baselines only taking advantage of image augmentation when the same amount of data and augmentation is used. We further perform evolutionary searches to find the optimal combination of multiple self-supervised losses for RL, but find that even such a loss combination fails to meaningfully outperform the methods that only utilize carefully designed image augmentations. After evaluating these approaches together in multiple different environments including a real-world robot environment, we confirm that no single self-supervised loss or image augmentation method can dominate all environments and that the current framework for joint optimization of SSL and RL is limited. Finally, we conduct the ablation study on multiple factors and demonstrate the properties of representations learned with different approaches.
翻译:我们探索了自我监督的学习(SSL)能否从像素中改进在线强化学习(RL) 。 我们扩展了对比式强化学习框架(例如CURL),共同优化了SSL和RL损失,并对各种自我监督的损失进行了大量实验。 我们的观察表明,现有的SSL RL框架不能在基线的基础上带来有意义的改进,而只是在使用相同数量的数据和增强力时,才利用了图像增强; 我们进一步进行了进化搜索,以找到RL多重自我监督损失的最佳组合,但发现即使这种损失组合也未能真正超越只使用精心设计的图像增强的方法。 在对包括真实世界机器人环境在内的多种环境中共同评估这些方法之后,我们确认,没有任何单一的自我监督损失或图像增强方法能够主宰所有环境,并且目前对SSL和RL的联合优化框架是有限的。 最后,我们进行了多种因素的调整研究,并展示了不同方法所学会的演示的演示特征。