Bridging Action Space Mismatch in Learning from Demonstrations (Bridging Action Space Mismatch in Learning from Demonstrations)

Learning from demonstrations (LfD) methods guide learning agents to a desired solution using demonstrations from a teacher. While some LfD methods can handle small mismatches in the action spaces of the teacher and student, here we address the case where the teacher demonstrates the task in an action space that can be substantially different from that of the student -- thereby inducing a large action space mismatch. We bridge this gap with a framework, Morphological Adaptation in Imitation Learning (MAIL), that allows training an agent from demonstrations by other agents with significantly different morphologies (from the student or each other). MAIL is able to learn from suboptimal demonstrations, so long as they provide some guidance towards a desired solution. We demonstrate MAIL on challenging household cloth manipulation tasks and introduce a new DRY CLOTH task -- cloth manipulation in 3D task with obstacles. In these tasks, we train a visual control policy for a robot with one end-effector using demonstrations from a simulated agent with two end-effectors. MAIL shows up to 27% improvement over LfD and non-LfD baselines. It is deployed to a real Franka Panda robot, and can handle multiple variations in cloth properties (color, thickness, size, material) and pose (rotation and translation). We further show generalizability to transfers from n-to-m end-effectors, in the context of a simple rearrangement task.

翻译：从示范学习中弥合行为空间不匹配问题学习自示范的方法（Learning from demonstrations，LfD）引导学习代理按照教师提供的示范获得最终解决方案。虽然某些LfD方法可以处理教师和学生之间的行为空间轻微不匹配的情况，但本文针对教师演示动作空间与学生显著不同的情况进行研究，以桥接行为空间不匹配问题。为此，我们提出了一种框架 Morphological Adaptation in Imitation Learning (MAIL)，该框架允许学习代理从具有显著不同形态（不同于学生或彼此）的其他代理的示范中进行训练。MAIL 可以从次优示范中学习，只要它们能够提供有关期望解决方案的一些指导。我们在具有挑战性的家庭用品布料操作任务中展示了 MAIL，引入了一个新的DRY CLOTH任务——在三维空间中操作布料并避开障碍物。在这些任务中，我们训练了一个视觉控制策略，该策略适用于具有一个末端执行器的机器人，使用模拟代理的示范进行训练，该模拟代理具有两个末端执行器。MAIL相比LfD和非LfD基线展现了27%的改善。通过实际应用到Franka Panda机器人，MAIL可以处理多种布料特性（颜色、厚度、大小、材料）和姿态（旋转和平移）的变化。我们进一步展示了在简单的重新排列任务中进行 n 到 m 末端执行器的转换的通用性。