Embodied agents are expected to perform more complicated tasks in an interactive environment, with the progress of Embodied AI in recent years. Existing embodied tasks including Embodied Referring Expression (ERE) and other QA-form tasks mainly focuses on interaction in term of linguistic instruction. Therefore, enabling the agent to manipulate objects in the environment for exploration actively has become a challenging problem for the community. To solve this problem, We introduce a new embodied task: Remote Embodied Manipulation Question Answering (REMQA) to combine ERE with manipulation tasks. In the REMQA task, the agent needs to navigate to a remote position and perform manipulation with the target object to answer the question. We build a benchmark dataset for the REMQA task in the AI2-THOR simulator. To this end, a framework with 3D semantic reconstruction and modular network paradigms is proposed. The evaluation of the proposed framework on the REMQA dataset is presented to validate its effectiveness.
翻译:预计嵌入物剂将在互动环境中执行更复杂的任务,随着喷雾式人工智能的近些年的进展,嵌入物剂可望在互动环境中执行更复杂的任务,现有包含性任务,包括嵌入式参考表达(ERE)和其他QA形式的任务,主要侧重于语言教学方面的相互作用,因此,使该物剂能够在环境中积极操纵物体进行探索已成为社区面临的一个难题。为了解决这一问题,我们引入一项新的包含性任务:远程嵌入式操纵问题回答(REMQA),将ERE与操作任务结合起来。在REMQA任务中,该物剂需要导航到远程位置,对目标对象进行操纵,以解答问题。我们在AI2-THOR模拟器中为REMQA的任务建立一个基准数据集。为此,提出了一个包含3D语义重组和模块网络模式的框架。对REMQA数据集的拟议框架的评估将证实其有效性。