In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction. When facing such a situation, a human tends to imagine what the destination may look like and to explore the environment based on prior knowledge of the environmental layout, such as the fact that a bathroom is more likely to be found near a bedroom than a kitchen. We have designed an autonomous agent called Layout-aware Dreamer (LAD), including two novel modules, that is, the Layout Learner and the Goal Dreamer to mimic this cognitive decision process. The Layout Learner learns to infer the room category distribution of neighboring unexplored areas along the path for coarse layout estimation, which effectively introduces layout common sense of room-to-room transitions to our agent. To learn an effective exploration of the environment, the Goal Dreamer imagines the destination beforehand. Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43% compared to the previous state-of-the-art. The code is released at https://github.com/zehao-wang/LAD
翻译:在这项工作中,我们研究的是安博迪化的隐蔽表达平台问题,在这个平台上,一个代理需要在一个先前不为人知的环境中航行,并将一个由简明的高水平自然语言教学描述的远程对象本地化。面对这种情况,一个人类倾向于想象目的地可能是什么样子,并基于事先对环境布局的了解来探索环境,例如浴室更有可能在卧室附近找到,而不是厨房。我们设计了一个名为布局-觉悟梦想者(LAD)的自主代理(LAD),包括两个新颖模块,即布局学习者和目标梦想者,以模拟这个认知决策程序。布局学习者学会推算出沿粗糙的布局估计路径的相邻未勘探地区的室分类分布,这有效地引入了我们代理人逐间过渡的共同意识。为了有效地探索环境,“目标梦想者”预先想象着目的地。我们的代理在挑战隐形测试环境中的 RWEIE43数据设置的公共领导板上取得了新的状态/艺术表现,并改进了隐形测试环境,改进了导航成功度环境,(SR) 由4.02和远端的RGM/RB3 成功率,通过前LADMRB3 将成功率(RAD%) 向前的远程成功度-RB-RBD-RBD-RB-RB-RB-RB-RB-R-RB-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-RB-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-