Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space. However, learning world models in unconstrained environments over high-dimensional observation spaces such as images is challenging. One source of difficulty is the presence of irrelevant but hard-to-model background distractions, and unimportant visual details of task-relevant entities. We address this issue by learning a recurrent latent dynamics model which contrastively predicts the next observation. This simple model leads to surprisingly robust robotic control even with simultaneous camera, background, and color distractions. We outperform alternatives such as bisimulation methods which impose state-similarity measures derived from divergence in future reward or future optimal actions. We obtain state-of-the-art results on the Distracting Control Suite, a challenging benchmark for pixel-based robotic control.
翻译:建模世界可以通过为塑造代理人潜伏状态空间提供丰富的培训信号而有益于机器人学习。 但是,在高维观测空间(如图像)上不受限制的环境中学习世界模型是挑战性的。 困难的一个来源是存在不相关但难以建模的背景分心,以及与任务相关的实体的不重要的视觉细节。 我们通过学习一个反复出现的潜在动态模型来解决这一问题,该模型对下一次观测作出对比预测。 这个简单模型导致惊人的强大机器人控制,即使同时使用相机、背景和颜色分心。 我们的替代方法也比其他方法要强,例如将来自未来奖赏或未来最佳行动差异的相似性措施强加于我们。 我们获得了关于昆虫控制套的最先进的结果,这是基于像素的机器人控制的一个挑战性基准。