We envision robots that can collaborate and communicate seamlessly with humans. It is necessary for such robots to decide both what to say and how to act, while interacting with humans. To this end, we introduce a new task, dialogue object search: A robot is tasked to search for a target object (e.g. fork) in a human environment (e.g., kitchen), while engaging in a "video call" with a remote human who has additional but inexact knowledge about the target's location. That is, the robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera. This task is challenging at multiple levels, from data collection, algorithm and system development,to evaluation. Despite these challenges, we believe such a task blocks the path towards more intelligent and collaborative robots. In this extended abstract, we motivate and introduce the dialogue object search task and analyze examples collected from a pilot study. We then discuss our next steps and conclude with several challenges on which we hope to receive feedback.
翻译:我们设想可以与人类进行无缝合作和交流的机器人。 在与人类互动的同时,这些机器人必须决定如何说话和如何行动。 为此,我们引入了新的任务,对话对象搜索:机器人的任务是在人类环境中(例如厨房)寻找目标物体(例如叉子),同时与对目标位置拥有额外但不确切知识的远程人类进行“视频呼叫 ” 。 也就是说, 机器人与人类进行语音对话, 并分享其挂载相机的图像。 这项任务具有多重挑战性, 从数据收集、算法和系统开发到评估。 尽管存在这些挑战, 我们相信这样的任务阻挡了通往更智能和协作机器人的道路。 在这个扩展的抽象中, 我们激励和引入对话对象搜索任务, 分析从试点研究中收集的示例。 然后我们讨论我们的下一步, 并结束我们希望获得反馈的若干挑战。