In this paper, we propose a novel task, Manipulation Question Answering (MQA), where the robot performs manipulation actions to change the environment in order to answer a given question. To solve this problem, a framework consisting of a QA module and a manipulation module is proposed. For the QA module, we adopt the method for the Visual Question Answering (VQA) task. For the manipulation module, a Deep Q Network (DQN) model is designed to generate manipulation actions for the robot to interact with the environment. We consider the situation where the robot continuously manipulating objects inside a bin until the answer to the question is found. Besides, a novel dataset that contains a variety of object models, scenarios and corresponding question-answer pairs is established in a simulation environment. Extensive experiments have been conducted to validate the effectiveness of the proposed framework.
翻译:在本文中,我们提出了一个新颖的任务:操纵问题解答(MQA),让机器人为改变环境而操作操作操作,以便回答一个特定的问题。为了解决这个问题,我们提出了一个由QA模块和操作模块组成的框架。对于QA模块,我们采用了视觉问题解答(VQA)任务的方法。对于操作模块,设计了一个深Q网络(DQN)模型,以产生操作操作动作,让机器人与环境互动。我们考虑了机器人在找到问题的答案之前在垃圾箱中持续操纵对象的情况。此外,在模拟环境中还建立了包含各种对象模型、情景和相应的问答配对的新数据集。为了验证拟议框架的有效性,已经进行了广泛的实验。