Robotic grasping for a diverse set of objects is essential in many robot manipulation tasks. One promising approach is to learn deep grasping models from training datasets of object images and grasp labels. Approaches in this category require millions of data to train deep models. However, empirical grasping datasets typically consist of sparsely labeled samples (i.e., a limited number of successful grasp labels in each image). This paper proposes a Maximum Likelihood Grasp Sampling Loss (MLGSL) to tackle the data sparsity issue. The proposed method supposes that successful grasp labels are sampled from a ground-truth grasp distribution and aims to recover the ground-truth map. MLGSL is utilized for training a fully convolutional network that detects thousands of grasps simultaneously. Training results suggest that models based on MLGSL can learn to grasp with datasets composing of 2 labels per image, which implies that it is 8x more data-efficient than current state-of-the-art techniques. Meanwhile, physical robot experiments demonstrate an equivalent performance in detecting robust grasps at a 91.8% grasp success rate on household objects.
翻译:许多机器人操作任务中必须掌握多种天体的机器人抓取。 一种有希望的方法是从对象图像和抓取标签的训练数据集中学习深藏模型。 这一类别的方法需要数百万数据来训练深层模型。 但是, 实证抓取数据集通常由很少贴标签的样本组成( 即每张图像中只有数量有限的成功抓取标签 ) 。 本文建议用一个最大相似度采集失败( MLGSL) 来解决数据扩散问题 。 提议的方法假设, 成功的抓取标签是从地盘抓取的分布中取样, 目的是恢复地盘图 。 MLGSL 用于训练一个完全进化的网络, 同时检测数千个抓取。 培训结果表明, 以 MLGSLL为主的模型可以学习用数据集来捕捉每个图像的2个标签, 这意味着它比目前的最新技术要高8x数据效率。 与此同时, 物理机器人实验显示, 在以91.8% 的家用物体的捕捉取率探测稳性捕捉取能力方面, 具有同等的功能。