Interactive Imitation Learning (IIL) is a branch of Imitation Learning (IL) where human feedback is provided intermittently during robot execution allowing an online improvement of the robot's behavior. In recent years, IIL has increasingly started to carve out its own space as a promising data-driven alternative for solving complex robotic tasks. The advantages of IIL are its data-efficient, as the human feedback guides the robot directly towards an improved behavior, and its robustness, as the distribution mismatch between the teacher and learner trajectories is minimized by providing feedback directly over the learner's trajectories. Nevertheless, despite the opportunities that IIL presents, its terminology, structure, and applicability are not clear nor unified in the literature, slowing down its development and, therefore, the research of innovative formulations and discoveries. In this article, we attempt to facilitate research in IIL and lower entry barriers for new practitioners by providing a survey of the field that unifies and structures it. In addition, we aim to raise awareness of its potential, what has been accomplished and what are still open research questions. We organize the most relevant works in IIL in terms of human-robot interaction (i.e., types of feedback), interfaces (i.e., means of providing feedback), learning (i.e., models learned from feedback and function approximators), user experience (i.e., human perception about the learning process), applications, and benchmarks. Furthermore, we analyze similarities and differences between IIL and RL, providing a discussion on how the concepts offline, online, off-policy and on-policy learning should be transferred to IIL from the RL literature. We particularly focus on robotic applications in the real world and discuss their implications, limitations, and promising future areas of research.
翻译:模拟模拟学习(IIL)是模拟学习(IL)的一个分支,在机器人执行期间,人类反馈时断断续续地提供,使机器人的行为在网上得到改进。近年来,IIL开始越来越多地开发自己的空间,作为解决复杂的机器人任务的一个有希望的数据驱动的替代方案。 IIL的优点在于数据效率,因为人类反馈引导机器人直接走向改善行为,也在于它的稳健性,因为教师和学习者轨迹之间的分布不匹配通过直接提供对学习者轨迹的反馈而最小化。然而,尽管IL提供的机会、术语、结构和适用性在文献中并不明确或统一,但开发速度放慢,因此,对创新的配方和发现的研究也因此,在本篇文章中,我们试图通过对精细的和结构化的字段进行调查,来帮助机器人研究,降低新的从业人员的进入障碍。 此外,我们力求通过直接在学习者轨迹上、我们完成的和仍在开放的研究问题上,提高对其潜力的准确性、我们是如何完成的和准确性。我们组织关于内部的理论、我们从人际的理论中学习的理论、我们从学习的理论、学习的理论、我们从人际的理论到未来的理论的理论和工具的理论、我们提供的理论的理论的理论的理论的理论的理论和工具的理论和理论的理论的理论的理论的理论的理论的理论的理论和理论和理论的理论的理论的理论的理论的理论的理论,我们,我们提供。