Current deep reinforcement learning (RL) approaches incorporate minimal prior knowledge about the environment, limiting computational and sample efficiency. \textit{Objects} provide a succinct and causal description of the world, and many recent works have proposed unsupervised object representation learning using priors and losses over static object properties like visual consistency. However, object dynamics and interactions are also critical cues for objectness. In this paper we propose a framework for reasoning about object dynamics and behavior to rapidly determine minimal and task-specific object representations. To demonstrate the need to reason over object behavior and dynamics, we introduce a suite of RGBD MuJoCo object collection and avoidance tasks that, while intuitive and visually simple, confound state-of-the-art unsupervised object representation learning algorithms. We also highlight the potential of this framework on several Atari games, using our object representation and standard RL and planning algorithms to learn dramatically faster than existing deep RL algorithms.
翻译:目前深入强化学习(RL)方法包含关于环境的最起码的先前知识,限制了计算和抽样效率。\ textit{Objects} 提供了对世界的简明和因果描述,而且许多最近的著作都提议利用前科进行不受监督的物体说明学习,对静态物体特性(如视觉一致性)进行损失。但是,物体动态和相互作用也是物体特性的关键提示。在本文中,我们提出了一个关于物体动态和行为的推理框架,以快速确定最小和特定任务对象表示。为了证明有必要对物体行为和动态进行解释,我们引入了一套RGBD MuJoCo物体收集和避免任务,这些任务既直观又视觉简单,又精密的、尖端的、不受监督的物体说明性物体说明学习算法。我们还强调了这一框架在几个Atari游戏上的潜力,利用我们的物体说明和标准RL以及规划算法来快速地学习超过现有的深RL算法。