Model-based reinforcement learning (RL) algorithms designed for handling complex visual observations typically learn some sort of latent state representation, either explicitly or implicitly. Standard methods of this sort do not distinguish between functionally relevant aspects of the state and irrelevant distractors, instead aiming to represent all available information equally. We propose a modified objective for model-based RL that, in combination with mutual information maximization, allows us to learn representations and dynamics for visual model-based RL without reconstruction in a way that explicitly prioritizes functionally relevant factors. The key principle behind our design is to integrate a term inspired by variational empowerment into a state-space model based on mutual information. This term prioritizes information that is correlated with action, thus ensuring that functionally relevant factors are captured first. Furthermore, the same empowerment term also promotes faster exploration during the RL process, especially for sparse-reward tasks where the reward signal is insufficient to drive exploration in the early stages of learning. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds, and show that the proposed prioritized information objective outperforms state-of-the-art model based RL approaches with higher sample efficiency and episodic returns. https://sites.google.com/view/information-empowerment
翻译:用于处理复杂视觉观测的基于模型的强化学习(RL)算法通常会明确或隐含地学习某种潜在的潜在国家代表性。这种标准方法并不区分国家和无关分散器的职能相关方面,而是力求平等地代表所有现有信息。我们建议基于模型的强化学习(RL)修改目标,与相互信息最大化相结合,使我们能够学习基于模型的基于视觉的RL的表述和动态,而不进行重建,以明确优先考虑与功能相关的因素。我们设计的关键原则是将受变异赋权启发的术语纳入基于相互信息的州-空间模型。这一术语优先考虑与行动相关的信息,从而确保首先捕捉到与功能相关的因素。此外,同一增强能力术语还有助于在RL进程中更快地探索,特别是在奖励信号不足以推动早期学习阶段的探索的稀疏工作。我们评估基于视觉的机器人控制任务组合与自然视频背景的方法,并显示拟议的优先信息目标与基于RLgogreat模型的RLgreat/Cepowerimes。