Rearrangement tasks have been identified as a crucial challenge for intelligent robotic manipulation, but few methods allow for precise construction of unseen structures. We propose a visual foresight model for pick-and-place rearrangement manipulation which is able to learn efficiently. In addition, we develop a multi-modal action proposal module which builds on the Goal-Conditioned Transporter Network, a state-of-the-art imitation learning method. Our image-based task planning method, Transporters with Visual Foresight, is able to learn from only a handful of data and generalize to multiple unseen tasks in a zero-shot manner. TVF is able to improve the performance of a state-of-the-art imitation learning method on unseen tasks in simulation and real robot experiments. In particular, the average success rate on unseen tasks improves from 55.4% to 78.5% in simulation experiments and from 30% to 63.3% in real robot experiments when given only tens of expert demonstrations. Video and code are available on our project website: https://chirikjianlab.github.io/tvf/
翻译:重新排列任务已被确定为智能机器人操作的关键挑战,但很少有方法能够精确地构建不可见的结构。我们提议了一个选择和地点重新排列操作的视觉前瞻模型,以便有效地学习。此外,我们开发了一个多式行动提案模块,该模块以“目标-有条件的运输者网络”为基础,这是一种最先进的模仿学习方法。我们基于图像的任务规划方法,即具有视觉视野的运输者,只能从少量的数据中学习,以零发方式概括到多种不可见的任务。TVF能够改进模拟和真正的机器人实验中现代任务最先进的模仿学习方法的性能。特别是,在模拟实验中,不可见任务的平均成功率从55.4%提高到78.5%,在实际机器人实验中从30%提高到63.3%,但仅提供数十项专家演示。视频和代码可在我们的项目网站上查阅:https://chirikjianb.github.io/tvf/ 网站查阅:https://chirikianlab.github.io/tvf/