We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction. Our goal is to learn representations that both provide for effective downstream control and invariance to task-irrelevant details. Bisimulation metrics quantify behavioral similarity between states in continuous MDPs, which we propose using to learn robust latent representations which encode only the task-relevant information from observations. Our method trains encoders such that distances in latent space equal bisimulation distances in state space. We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks, where the background is replaced with moving distractors and natural videos, while achieving SOTA performance. We also test a first-person highway driving task where our method learns invariance to clouds, weather, and time of day. Finally, we provide generalization results drawn from properties of bisimulation metrics, and links to causal inference.
翻译:我们研究代表性学习如何在不依赖域知识或像素重建的情况下,从丰富的观测(如图像)中加快强化学习,而不必依赖域知识或像素重建。我们的目标是学习能够提供有效的下游控制且不能够反映与任务相关细节的表述。生物模拟衡量标准量化了持续MDP中各州的行为相似性,我们建议用这些衡量标准来学习只将任务相关信息从观测中编码出来的强健的潜在表现。我们的方法训练了隐蔽空间的距离等同州空间的平衡距离。我们展示了我们使用修改后的视觉 MuJoCo任务忽视与任务有关的信息的方法的有效性,即背景被移动分散器和自然视频所取代,同时实现SOTA的性能。我们还测试了一条第一人行的高速公路驱动任务,我们的方法在那里学习云、天气和时间的变异性。最后,我们提供了从刺激指标的特性和因果关系推导出的一般性结果。