An embodied task such as embodied question answering (EmbodiedQA), requires an agent to explore the environment and collect clues to answer a given question that related with specific objects in the scene. The solution of such task usually includes two stages, a navigator and a visual Q&A module. In this paper, we focus on the navigation and solve the problem of existing navigation algorithms lacking experience and common sense, which essentially results in a failure finding target when robot is spawn in unknown environments. Inspired by the human ability to think twice before moving and conceive several feasible paths to seek a goal in unfamiliar scenes, we present a route planning method named Path Estimation and Memory Recalling (PEMR) framework. PEMR includes a "looking ahead" process, \textit{i.e.} a visual feature extractor module that estimates feasible paths for gathering 3D navigational information, which is mimicking the human sense of direction. PEMR contains another process ``looking behind'' process that is a memory recall mechanism aims at fully leveraging past experience collected by the feature extractor. Last but not the least, to encourage the navigator to learn more accurate prior expert experience, we improve the original benchmark dataset and provide a family of evaluation metrics for diagnosing both navigation and question answering modules. We show strong experimental results of PEMR on the EmbodiedQA navigation task.
翻译:包含式解答( EmbodiedQA) 的包含式任务, 如包含式解答( Embodied QA), 需要一名代理来探索环境并收集线索, 以解答与现场特定对象相关的特定问题。 任务的解决方案通常包括两个阶段, 导航器和视觉 ⁇ A 模块。 在本文中, 我们侧重于导航并解决现有的导航算法问题, 缺乏经验和常识, 这主要导致在机器人在未知环境中产卵时无法找到目标。 受人类在移动和构思一些可行的路径以在不熟悉的场景中寻找目标之前进行两次思考的能力的启发, 我们提出了一个路线规划方法, 名为“ 路径估计和记忆回顾( PEMR) 框架。 PEMR 包括一个“ 向前看” 进程,\ textitilit{i.e.} 视觉选取模块, 估计收集3D 导航信息的可行路径, 从而模拟人类方向感。 PEMR 包含另一个进程“ 向后回顾”, 旨在充分利用我们地貌提取器采集器所收集的过去经验。 最后但并非最不重要的是, 鼓励原始的导航和实验模型的模型, 向前的模型展示。