POMDP中加强学习结构化世界信仰 (Structured World Belief for Reinforcement Learning in POMDP)

Object-centric world models provide structured representation of the scene and can be an important backbone in reinforcement learning and planning. However, existing approaches suffer in partially-observable environments due to the lack of belief states. In this paper, we propose Structured World Belief, a model for learning and inference of object-centric belief states. Inferred by Sequential Monte Carlo (SMC), our belief states provide multiple object-centric scene hypotheses. To synergize the benefits of SMC particles with object representations, we also propose a new object-centric dynamics model that considers the inductive bias of object permanence. This enables tracking of object states even when they are invisible for a long time. To further facilitate object tracking in this regime, we allow our model to attend flexibly to any spatial location in the image which was restricted in previous models. In experiments, we show that object-centric belief provides a more accurate and robust performance for filtering and generation. Furthermore, we show the efficacy of structured world belief in improving the performance of reinforcement learning, planning and supervised reasoning.

翻译：以物体为中心的世界模型提供了对场景的结构性描述,并且可以成为加强学习和规划的重要支柱。然而,由于缺乏信仰状态,现有方法在部分可观察的环境中受到损害。在本文中,我们提出了结构世界信仰,这是学习和推断以物体为中心的信仰状态的模式。我们信仰国提供了多种以物体为中心的场景假设。为了将SMC颗粒的惠益与物体表示方式结合起来,我们还提出了一个新的以物体为中心的动态模型,考虑到物体永久性的诱导偏向。这样可以长期跟踪物体状态,即使物体处于隐形状态。为了进一步便利对物体的跟踪,我们允许我们的模型灵活地进入以往模型中限制的任何空间位置。在实验中,我们表明以物体为中心的信念为过滤和生成提供了更加准确和有力的表现。此外,我们展示了结构化的世界信念在改进强化学习、规划和监督推理的绩效方面的效力。

相关内容

SMC

关注 0

SMC:IEEE International Conference on Systems,Man, and Cybernetics Explanation：IEEE系统、人与控制论国际会议。 Publisher：IEEE。 SIT： https://dblp.uni-trier.de/db/conf/smc/

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日