High-dimensional observations are a major challenge in the application of model-based reinforcement learning (MBRL) to real-world environments. To handle high-dimensional sensory inputs, existing approaches use representation learning to map high-dimensional observations into a lower-dimensional latent space that is more amenable to dynamics estimation and planning. In this work, we present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time. Since this approach focuses on encoding temporally-predictable information, we implicitly prioritize the encoding of task-relevant components over nuisance information within the environment that are provably task-irrelevant. By learning this representation in conjunction with a recurrent state space model, we can then perform planning in latent space. We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task. Our experiments show that our model is superior to existing methods in the challenging complex-background setting while remaining competitive with current state-of-the-art models in the standard setting.
翻译:高维观测是将基于模型的强化学习(MBRL)应用到现实世界环境中的一大挑战。 要处理高维感官输入, 现有方法使用代表式学习将高维观测绘制成一个更适于动态估计和规划的低维潜层空间。 在这项工作中, 我们提出一种信息理论方法, 使用时间预测编码来编码环境中可以随时预测的元素。 由于这种方法侧重于对时间- 可预见信息进行编码, 我们隐含地将任务相关部分的编码优先于环境中与任务密切相关的骚扰信息。 通过与经常性状态空间模型一起学习这种表述, 我们就可以在潜在空间进行规划。 我们评估了我们关于对标准DMM控制任务进行具有挑战性的修改的模式, 其背景被包含复杂但与规划任务无关信息的自然视频所取代。 我们的实验表明, 我们的模型优于挑战性复杂背景环境中的现有方法, 同时在标准环境中与当前的最新模型保持竞争力。