Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning (RL) in real scenarios. However, visual distractions -- which are common in real scenes -- from high-dimensional observations can be hurtful to the learned representations in visual RL, thus degrading the performance of generalization. To tackle this problem, we propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information by learning reward sequence distributions (RSDs), as the reward signals are task-relevant in RL and invariant to visual distractions. Specifically, to effectively capture the task-relevant information via RSDs, CRESP introduces an auxiliary task -- that is, predicting the characteristic functions of RSDs -- to learn task-relevant representations, because we can well approximate the high-dimensional distributions by leveraging the corresponding characteristic functions. Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments, outperforming several state-of-the-arts on DeepMind Control tasks with different visual distractions.
翻译:不同环境的普及与任务相同,对于在真实情景中成功应用视觉强化学习(RL)至关重要。然而,从高维观测到视觉强化学习(RL)的视觉分流 -- -- 这是真实场景中常见的 -- -- 可能会对视觉强化学习(RL)的学习表现造成伤害,从而降低一般化的绩效。为了解决这一问题,我们提议了一种新颖的方法,即典型的再分流序列预测(CRESP),通过学习奖励序列分布来提取与任务相关的信息,因为奖励信号在RL中与任务相关,并且与视觉分散有关。具体地说,为了通过 RSD有效捕捉任务相关信息,CRESP引入了一项辅助任务 -- -- 即预测RSD的特性功能 -- -- 学习与任务相关的表现,因为我们可以通过利用相应的特性函数来非常接近高维度分布。实验表明,CRESP大大改进了对看不见环境的概括性表现,比DepMed控制任务中不同视觉分流。