Recent work in visual end-to-end learning for robotics has shown the promise of imitation learning across a variety of tasks. Such approaches are expensive both because they require large amounts of real world training demonstrations and because identifying the best model to deploy in the real world requires time-consuming real-world evaluations. These challenges can be mitigated by simulation: by supplementing real world data with simulated demonstrations and using simulated evaluations to identify high performing policies. However, this introduces the well-known "reality gap" problem, where simulator inaccuracies decorrelate performance in simulation from that of reality. In this paper, we build on top of prior work in GAN-based domain adaptation and introduce the notion of a Task Consistency Loss (TCL), a self-supervised loss that encourages sim and real alignment both at the feature and action-prediction levels. We demonstrate the effectiveness of our approach by teaching a mobile manipulator to autonomously approach a door, turn the handle to open the door, and enter the room. The policy performs control from RGB and depth images and generalizes to doors not encountered in training data. We achieve 80% success across ten seen and unseen scenes using only ~16.2 hours of teleoperated demonstrations in sim and real. To the best of our knowledge, this is the first work to tackle latched door opening from a purely end-to-end learning approach, where the task of navigation and manipulation are jointly modeled by a single neural network.
翻译:机器人视觉端到端学习的近期工作表明,有可能在各种任务中模仿学习。这些方法非常昂贵,因为它们需要大量真实的世界培训演示,而确定在现实世界部署的最佳模式需要耗费时间的现实世界评估。这些挑战可以通过模拟来减轻:通过模拟演示来补充真实世界数据,并使用模拟演示和模拟评估来确定高绩效的政策。然而,这带来了众所周知的“现实差距”问题,在模拟现实时模拟不准确的装饰性表现。在本文中,我们在基于GAN的域改造的先前工作的基础上更进一步,并引入了任务一致性损失的概念,这是一种自我控制的损失,鼓励在功能和行动定位层面进行模拟和真正协调。我们通过教一个移动操纵器来自主地使用门,把把手转向打开门,进入房间。政策控制来自RGB和深度图像,将没有在GAN的域域改造中遇到的门一般化,引入一个任务一致性损失(TCL)的概念,这是一种自我控制的损失,这种自我控制的损失,鼓励在特性和行动定位层面进行模拟,我们只能用80 % 成功看到一个真正操作的网络,从一个真正的操作到一个真正的操作过程。