We introduce a practical pipeline that interactively encodes multimodal human demonstrations for robot teaching. This pipeline is designed as an input system for a framework called Learning-from-Observation (LfO), which aims to program household robots with manipulative tasks through few-shots human demonstration without coding. While most previous LfO systems run with visual demonstration, recent research on robot teaching has shown the effectiveness of verbal instruction in making recognition robust and teaching interactive. To the best of our knowledge, however, no LfO system has yet been proposed that utilizes both verbal instruction and interaction, namely \textit{multimodal LfO}. This paper proposes the interactive task encoding system (ITES) as an input pipeline for multimodal LfO. ITES assumes that the user teaches step-by-step, pausing hand movements in order to match the granularity of human instructions with the granularity of robot execution. ITES recognizes tasks based on step-by-step verbal instructions that accompany the hand movements. Additionally, the recognition is made robust through interactions with the user. We test ITES on a real robot and show that the user can successfully teach multiple operations through multimodal demonstrations. The results suggest the usefulness of ITES for multimodal LfO. The source code is available at https://github.com/microsoft/symbolic-robot-teaching-interface.
翻译:我们引入了一个实用的管道,以互动方式将人类的多式联运演示编码为机器人教学。这个管道设计成一个名为“从观察中学习”(LfO)的框架的输入系统,其目的是通过不做编码的几发人演示,对操控任务的家庭机器人进行编程。虽然大多数先前的LfO系统都是通过视觉演示运行的,但最近对机器人教学的研究表明,口头教学在使认可强有力和教学互动方面是有效的。然而,据我们所知,还没有提出利用口头教学和互动(即\textit{Multimodal LfO})的LfO系统。本文提出交互式任务编码系统(ITS),作为多式联运LfO的输入管道。ITES假设,用户教分步教,用手动,以便使人类指令的颗粒性与机器人执行的颗粒性相匹配。ITES承认基于逐步口头指令的任务,伴随手动。此外,通过与用户的互动(即TES-Mont-modal-modal-tologyal)的识别,我们测试了IES-tomal-modial-modistrations 。