The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all possible nuisance factors.How can artificial agents do the same? What kind of information can agents safely discard as noises? In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant. This framework clarifies the kinds information removed by various prior work on representation learning in reinforcement learning (RL), and leads to our proposed approach of learning a Denoised MDP that explicitly factors out certain noise distractors. Extensive experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression.
翻译:从噪声中分离出信号并用清晰的抽象思考是智能的关键。具备此能力,人类可以在考虑了非常少的干扰因素的前提下高效地执行现实世界中的任务。那么人工智能代理如何做到这一点呢?代理可以安全地丢弃哪些信息作为噪声?在本研究中,我们根据可控性和与奖励的关系将野外信息分为四类,并将有用的信息定义为既可控又与奖励相关的信息。这个框架澄清了强化学习中各种表示学习的先前工作去除的信息种类,并导致我们提出了一种学习去噪 MDP(马尔可夫决策过程)的方法,该方法明确地因素化某些噪音干扰因素。在DeepMind Control Suite和RoboDesk的变体上进行的大量实验表明,我们的去噪世界模型的性能比仅使用原始观测数据以及比以前的工作在策略优化控制任务以及联合位置回归这两类任务上表现得更优秀。