In recent years several learning approaches to point goal navigation in previously unseen environments have been proposed. They vary in the representations of the environments, problem decomposition, and experimental evaluation. In this work, we compare the state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem. We adapt the (POMDP) sub-goal framework proposed by [1] and modify the component that estimates frontier properties by using partial semantic maps of indoor scenes built from images' semantic segmentation. In addition to the well-known completeness of the model-based approach, we demonstrate that it is robust and efficient in that it leverages informative, learned properties of the frontiers compared to an optimistic frontier-based planner. We also demonstrate its data efficiency compared to the end-to-end deep reinforcement learning approaches. We compare our results against an optimistic planner, ANS and DD-PPO on Matterport3D dataset using the Habitat Simulator. We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data.
翻译:近些年来,提出了几种在先前不为人知的环境中点目标导航的学习方法,在环境的表述、问题分解和实验性评价方面各不相同。在这项工作中,我们比较了以部分可观测的Markov 决策程序(POMDP)为基准目标导航问题的最先进的深强化学习方法。我们调整了[1] 提议的(POMDP)次级目标框架,并修改了利用图像语义分层构建的室内场景部分语义图来估计前沿特性的构成部分。除了以模型为基础的方法众所周知的完整性外,我们证明它非常有力和有效,因为它利用了信息丰富、了解的边界特性,而与乐观的基于边界的规划者相比。我们还展示了它的数据效率,与端到端深的强化学习方法相比。我们用生境模拟器比较了我们的成果与乐观的规划者、ANS和D-PPPO在MM3D数据集上的数据。我们显示的类似性,尽管比SATA DD-PPO方法差得多,但数据却要少得多。