Unsupervised object-centric representation (OCR) learning has recently drawn attention as a new paradigm of visual representation. This is because of its potential of being an effective pre-training technique for various downstream tasks in terms of sample efficiency, systematic generalization, and reasoning. Although image-based reinforcement learning (RL) is one of the most important and thus frequently mentioned such downstream tasks, the benefit in RL has surprisingly not been investigated systematically thus far. Instead, most of the evaluations have focused on rather indirect metrics such as segmentation quality and object property prediction accuracy. In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as ``Does OCR pre-training improve performance on object-centric tasks?'' and ``Can OCR pre-training help with out-of-distribution generalization?''. Our results provide empirical evidence for valuable insights into the effectiveness of OCR pre-training for RL and the potential limitations of its use in certain scenarios. Additionally, this study also examines the critical aspects of incorporating OCR pre-training in RL, including performance in a visually complex environment and the appropriate pooling layer to aggregate the object representations.
翻译:最近,无监督的基于对象中心表示(OCR)学习吸引了人们的关注,成为了视觉表示的新范式。这是因为它具有在样本效率、系统化推广和推理方面成为各种下游任务的有效预训练技术的潜力。尽管基于图像的增强学习(RL)是最重要的下游任务之一,因此经常提到其潜在好处,但令人惊讶的是,迄今为止还没有系统地对其在RL中的效果进行调查。相反,大多数评估都集中在诸如分割质量和对象属性预测准确度等相当间接的指标上。在本文中,我们通过实证实验研究OCR预训练技术对基于图像的增强学习的有效性。为了进行系统评估,我们引入了一个简单的基于对象中心的视觉RL基准,并进行实验以回答诸如“OCR预训练是否改善了对基于对象中心的任务的性能?”和“OCR预训练能否帮助应对超出分布的泛化?”等问题。我们的结果为了解OCR预训练技术的有效性以及其在某些情况下使用的潜在限制提供了实证证据。此外,本研究还探讨了将OCR预训练技术纳入RL的关键方面,包括在视觉复杂环境下的性能和适当的汇聚层,以聚合对象表示。