Unsupervised object-centric representation (OCR) learning has recently drawn attention as a new paradigm of visual representation. This is because of its potential of being an effective pre-training technique for various downstream tasks in terms of sample efficiency, systematic generalization, and reasoning. Although image-based reinforcement learning (RL) is one of the most important and thus frequently mentioned such downstream tasks, the benefit in RL has surprisingly not been investigated systematically thus far. Instead, most of the evaluations have focused on rather indirect metrics such as segmentation quality and object property prediction accuracy. In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as ``Does OCR pre-training improve performance on object-centric tasks?'' and ``Can OCR pre-training help with out-of-distribution generalization?''. Our results provide empirical evidence for valuable insights into the effectiveness of OCR pre-training for RL and the potential limitations of its use in certain scenarios. Additionally, this study also examines the critical aspects of incorporating OCR pre-training in RL, including performance in a visually complex environment and the appropriate pooling layer to aggregate the object representations.
翻译:最近,未受监督的物体中心代表(OCR)学习作为一种新的视觉代表模式引起了人们的注意,这是因为它有可能在抽样效率、系统化的概括和推理方面成为各种下游任务的有效培训前技巧,尽管基于图像的强化学习(RL)是最重要的任务之一,因此经常提到这类下游任务,但令人惊讶的是,迄今为止对RL的惠益没有进行系统调查。相反,大多数评价侧重于相当间接的衡量标准,例如分化质量和物体财产预测准确性。我们在本文件中调查OCR通过实验进行图像强化学习前训练的有效性。关于系统评估,我们采用简单的以对象为中心的视觉RL基准并进行实验,回答问题,例如“Does OCR(OCR)培训前改进目标中心任务的业绩?”和“OCR(OCR)预培训前的帮助与分化的概括性?'。我们的结果为OCR(OCR)前训练的有效性及其在某些情景中的使用可能限制提供了宝贵的实证证据。此外,关于系统评价(包括RCR)前的系统化环境的关键方面,研究还考察了CRA级到CRBI的完整环境。