Recent work has shown that representation learning plays a critical role in sample-efficient reinforcement learning (RL) from pixels. Unfortunately, in real-world scenarios, representation learning is usually fragile to task-irrelevant distractions such as variations in background or viewpoint. To tackle this problem, we propose a novel clustering-based approach, namely Clustering with Bisimulation Metrics (CBM), which learns robust representations by grouping visual observations in the latent space. Specifically, CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments. Computing cluster assignments with bisimulation metrics enables CBM to capture task-relevant information, as bisimulation metrics quantify the behavioral similarity between observations. Moreover, CBM encourages the consistency of representations within each group, which facilitates filtering out task-irrelevant information and thus induces robust representations against distractions. An appealing feature is that CBM can achieve sample-efficient representation learning even if multiple distractions exist simultaneously.Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms and achieves state-of-the-art performance on both multiple and single distraction settings. The code is available at https://github.com/MIRALab-USTC/RL-CBM.
翻译:近期工作表明,代表性学习在像素样本高效强化学习(RL)中发挥着关键作用。 不幸的是,在现实世界情景中,代表性学习通常对任务相关干扰(如背景或观点的差异)十分脆弱,例如任务相关干扰(如背景或观点的差异)。为了解决这一问题,我们提议采用新的集群方法,即与生物模拟计量(CBM)集群(CBM)相结合,通过将视觉观测组合在潜藏空间中,学习强力表现(BBM),具体地说,CBM在两个步骤之间交替:(1)通过测量其与所学原型的平衡距离,将观测组合起来;(2)根据目前的集群任务学习一套原型。带有强化度的计算机集群任务使CBM能够捕捉任务相关的信息,作为强化性指标,量化观测行为相似性。此外,CBM鼓励每个组内部的表述的一致性,通过将任务相关信息分组来筛选,从而导致对分心力的强烈表现。一个吸引的特征是,即使同时存在多种分心功能,CBMRR能够显著提高任务相关样本效率。</s>