Learning representations for pixel-based control has garnered significant attention recently in reinforcement learning. A wide range of methods have been proposed to enable efficient learning, leading to sample complexities similar to those in the full state setting. However, moving beyond carefully curated pixel data sets (centered crop, appropriate lighting, clear background, etc.) remains challenging. In this paper, we adopt a more difficult setting, incorporating background distractors, as a first step towards addressing this challenge. We present a simple baseline approach that can learn meaningful representations with no metric-based learning, no data augmentations, no world-model learning, and no contrastive learning. We then analyze when and why previously proposed methods are likely to fail or reduce to the same performance as the baseline in this harder setting and why we should think carefully about extending such methods beyond the well curated environments. Our results show that finer categorization of benchmarks on the basis of characteristics like density of reward, planning horizon of the problem, presence of task-irrelevant components, etc., is crucial in evaluating algorithms. Based on these observations, we propose different metrics to consider when evaluating an algorithm on benchmark tasks. We hope such a data-centric view can motivate researchers to rethink representation learning when investigating how to best apply RL to real-world tasks.
翻译:以像素为基础的控制学习表现最近在强化学习中引起极大关注。 已经提出了一系列广泛的方法,以便能够高效学习,从而产生与整个状态环境相似的样本复杂性。 但是,超越精心整理的像素数据集(以作物为主的作物、适当的照明、清晰的背景等)仍然具有挑战性。 在本文件中,我们采取了一种更为困难的环境,将背景分流器纳入其中,作为应对这一挑战的第一步。我们提出了一个简单的基线方法,可以学习有意义的表达,而没有基于标准的学习,没有数据增强,没有世界模式的学习,没有对比性学习。然后,我们分析以前提出的方法何时和为什么可能失败或降低到与这一更困难环境中的基线相同的性能,以及为什么我们应该仔细考虑将这类方法扩大到成熟的环境之外。我们的结果表明,根据奖励密度、问题规划前景、存在与任务无关的组成部分等等等特点对基准参数进行精细的分类,对于评估算法至关重要。 基于这些观察,我们提出不同的指标,以便在评估基准任务中评估算法时,我们可考虑何时和为什么以前提出的方法有可能失败或降低与基准基准基准基准基准线一样的效能。 我们希望,在研究时如何使研究人员重新思考。