Learning-from-Observation (LfO) is a robot teaching framework for programming operations through few-shots human demonstration. While most previous LfO systems run with visual demonstration, recent research on robot teaching has shown the effectiveness of verbal instruction in making recognition robust and teaching interactive. To the best of our knowledge, however, few solutions have been proposed for LfO that utilizes verbal instruction, namely multimodal LfO. This paper aims to propose a practical pipeline for multimodal LfO. For input, an user temporally stops hand movements to match the granularity of human instructions with the granularity of robot execution. The pipeline recognizes tasks based on step-by-step verbal instructions accompanied by demonstrations. In addition, the recognition is made robust through interactions with the user. We test the pipeline on a real robot and show that the user can successfully teach multiple operations from multimodal demonstrations. The results suggest the utility of the proposed pipeline for multimodal LfO.
翻译:从观察中学习(LfO)是通过几个镜头的人类演示为编程操作提供机器人教学框架。虽然大多数以前的LfO系统都是通过视觉演示运行的,但最近对机器人教学的研究显示,口头教学在使承认强有力和教学互动方面是有效的。然而,据我们所知,对于使用口头教学(即多式联运LfO)的LfO,几乎没有提出什么解决办法。本文旨在为多式联运LfO提议一条实用的管道。关于输入,用户暂时停止手动,以匹配人类指令的颗粒性与机器人执行的颗粒性。管道承认基于渐进式口头指令的任务,同时进行演示。此外,通过与用户的互动,这种认识得到了有力的确认。我们用真正的机器人测试管道,并表明用户能够成功地从多式联运演示中教授多种操作。结果显示,拟议的多式联运示范管道的效用是。