In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios.
翻译:在本文中,我们提出一个新的基于知识的共鸣问答(K-EQA)任务,该代理商在其中明智地探索环境,以便用知识回答各种问题。与明确规定问题的目标对象不同的是,该代理商可以借助外部知识来理解更为复杂的问题,如“请告诉我什么是用来切割室内食物的物体?”等,该代理商必须知道“刀片用于切割食物”等知识。为解决K-EQA问题,提出了基于神经方案综合推理的新框架,其中提出了外部知识和3D场景图的联合推理,以实现导航和回答。特别是,3D场图可以提供存储所访问场景的视觉信息的记忆,从而大大提高多方向问题回答的效率。实验结果表明,拟议的框架能够回答包含环境中的更为复杂和现实的问题。拟议的方法也适用于多工具情景。