Object grasping in cluttered scenes is a widely investigated field of robot manipulation. Most of the current works focus on estimating grasp pose from point clouds based on an efficient single-shot grasp detection network. However, due to the lack of geometry awareness of the local grasping area, it may cause severe collisions and unstable grasp configurations. In this paper, we propose a two-stage grasp pose refinement network which detects grasps globally while fine-tuning low-quality grasps and filtering noisy grasps locally. Furthermore, we extend the 6-DoF grasp with an extra dimension as grasp width which is critical for collisionless grasping in cluttered scenes. It takes a single-view point cloud as input and predicts dense and precise grasp configurations. To enhance the generalization ability, we build a synthetic single-object grasp dataset including 150 commodities of various shapes, and a multi-object cluttered scene dataset including 100k point clouds with robust, dense grasp poses and mask annotations. Experiments conducted on Yumi IRB-1400 Robot demonstrate that the model trained on our dataset performs well in real environments and outperforms previous methods by a large margin.
翻译:在乱成一团的场景中捕捉物体是一个广泛调查的机器人操纵领域。 目前大部分工作的重点是根据高效的单发抓取探测网,从点云中估计抓取姿势。 但是,由于对本地抓取区缺乏几何认识,它可能造成严重的碰撞和不稳定的抓取配置。 在本文中, 我们提议一个两阶段的抓取网, 在微调低质量的抓取和过滤本地的抓取的同时, 检测全球抓取的精细网络。 此外, 我们扩展 6 - DoF 抓取, 以额外尺寸作为抓取宽度的抓取, 这对于在乱成一团的场景中无碰撞捕捉取至关重要。 它将单视点云作为输入, 预测密度和精确的抓取配置。 为了提高一般化能力, 我们建立一个合成的单点抓取数据集, 包括各种形状的150种商品, 以及多点包罗式的场景数据集, 包括100公里的点云, 坚固、 密的抓抓取姿和面具说明。 在Yumi IRB-1400机器人上进行的实验表明, 我们数据集的模型所训练的模型在真实环境中运行良好, 并超越了以前的方法。