End-to-end learning for visual robotic manipulation is known to suffer from sample inefficiency, requiring large numbers of demonstrations. The spatial roto-translation equivariance, or the SE(3)-equivariance can be exploited to improve the sample efficiency for learning robotic manipulation. In this paper, we present SE(3)-equivariant models for visual robotic manipulation from point clouds that can be trained fully end-to-end. By utilizing the representation theory of the Lie group, we construct novel SE(3)-equivariant energy-based models that allow highly sample efficient end-to-end learning. We show that our models can learn from scratch without prior knowledge and yet are highly sample efficient (5~10 demonstrations are enough). Furthermore, we show that our models can generalize to tasks with (i) previously unseen target object poses, (ii) previously unseen target object instances of the category, and (iii) previously unseen visual distractors. We experiment with 6-DoF robotic manipulation tasks to validate our models' sample efficiency and generalizability. Codes are available at: https://github.com/tomato1mule/edf
翻译:视觉机器人操控的端到端学习已知因抽样效率低下而受到影响,需要大量演示。空间转折变换变异性或SE(3)-等同性可以用来提高学习机器人操控的样本效率。在本文中,我们介绍了可以从点云中进行全端到端培训的端到端的视觉机器人操控的SE(3)-等异性模型。我们利用利伊组的表述理论,构建了新的SE(3)-等异能源基模型,允许高抽样效率的端到端学习。我们展示了我们的模型可以在没有事先知识的情况下从零中学习,而具有高度的样本效率(5~10演示已经足够 )。此外,我们展示了我们的模型可以将任务(一) 先前看不见的目标物体构成,(二) 以前看不见的类别目标物体实例,和(三) 先前看不见的视觉分散器。我们实验了6-DoF机器人操控任务,以验证我们的模型的样品效率和可概括性。我们可以看到代码,网址是: https://github.com/tomatolule/edfleam/edffff。</s>