Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we introduce a robust representation learning approach for RL. We introduce an auxiliary objective based on the multi-view information bottleneck (MIB) principle which encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions. This enables us to train high-performance policies that are robust to visual distractions and can generalize to unseen environments. We demonstrate that our approach can achieve SOTA performance on challenging visual control tasks, even when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines on generalization to unseen environments using the large-scale Procgen benchmark.
翻译:深强化学习( DRL) 代理器往往对培训环境中看不见的视觉变化十分敏感。 为了解决这一问题,我们为RL引入了强有力的代表性学习方法。 我们引入了一个基于多视角信息瓶颈原则的辅助目标。 我们引入了一个基于多视角信息瓶颈原则的辅助目标,鼓励对未来作出预测的学习表现,对与任务无关的分心不太敏感。 这使我们能够培训高性能政策,这种政策对于视觉分散注意力是强有力的,并且能够向看不见的环境推广。 我们证明我们的方法可以在挑战视觉控制任务方面实现SOTA的绩效,即使背景被自然视频所取代。 此外,我们展示了我们的方法在使用大规模Procgen基准将一般化到不可见环境方面超过了既定的基线。