The utility of collocating robots largely depends on the easy and intuitive interaction mechanism with the human. If a robot accepts task instruction in natural language, first, it has to understand the user's intention by decoding the instruction. However, while executing the task, the robot may face unforeseeable circumstances due to the variations in the observed scene and therefore requires further user intervention. In this article, we present a system called Talk-to-Resolve (TTR) that enables a robot to initiate a coherent dialogue exchange with the instructor by observing the scene visually to resolve the impasse. Through dialogue, it either finds a cue to move forward in the original plan, an acceptable alternative to the original plan, or affirmation to abort the task altogether. To realize the possible stalemate, we utilize the dense captions of the observed scene and the given instruction jointly to compute the robot's next action. We evaluate our system based on a data set of initial instruction and situational scene pairs. Our system can identify the stalemate and resolve them with appropriate dialogue exchange with 82% accuracy. Additionally, a user study reveals that the questions from our systems are more natural (4.02 on average on a scale of 1 to 5) as compared to a state-of-the-art (3.08 on average).
翻译:对机器人进行校正的效用主要取决于与人类的简单和直觉互动机制。 如果机器人接受自然语言的任务指令, 首先, 它必须理解用户的意图, 解码指令。 但是, 在执行任务时, 机器人可能面临无法预见的情况, 因为观测到的场景存在差异, 因此需要用户的进一步干预 。 在文章中, 我们提出了一个称为 Talk- Resolve (TTR) 的系统, 使机器人能够通过直观观察场景与导师进行连贯的对话交流, 从而解决僵局 。 通过对话, 它或者找到线索, 要在原始计划中向前推进, 一种可以接受的替代原计划, 或者确认完全中止任务 。 为了实现可能的僵局, 我们使用观测到的场景的密集说明和给定的指示来共同计算机器人的下一个动作 。 我们根据初步指令和情景对的数据集来评估我们的系统 。 我们的系统可以识别僵局, 用82%的准确度进行适当的对话交换 。 此外, 用户研究显示, 我们系统的问题在平均比例上比较自然( 402比平均比例为402比1) 。