Proposing grasp poses for novel objects is an essential component for any robot manipulation task. Planning six degrees of freedom (DoF) grasps with a single camera, however, is challenging due to the complex object shape, incomplete object information, and sensor noise. In this paper, we present a 6-DoF contrastive grasp proposal network (CGPN) to infer 6-DoF grasps from a single-view depth image. First, an image encoder is used to extract the feature map from the input depth image, after which 3-DoF grasp regions are proposed from the feature map with a rotated region proposal network. Feature vectors that within the proposed grasp regions are then extracted and refined to 6-DoF grasps. The proposed model is trained offline with synthetic grasp data. To improve the robustness in reality and bridge the simulation-to-real gap, we further introduce a contrastive learning module and variant image processing techniques during the training. CGPN can locate collision-free grasps of an object using a single-view depth image within 0.5 seconds. Experiments on a physical robot further demonstrate the effectiveness of the algorithm.
翻译:对新天体进行预览是任何机器人操作任务的一个基本组成部分。 但是,由于物体形状复杂、物体信息不完整和传感器噪音,用单一相机来规划六度自由(DoF)捕捉具有挑战性。 在本文件中,我们提出了一个6度对比捕捉建议网络(CGPN),从单一视图深度图像中推导6度到DoF捕捉。首先,图像编码器用于从输入深度图像中提取地貌图,随后用一个旋转区域建议网络从特征地图中提出3度-DoF捕捉区域。在拟议捕捉区域内的特性矢量随后被提取和精细化为6度-DoF捕捉取。拟议模型用合成抓捉数据进行离线培训。为了提高现实的稳健性和弥合模拟-现实差距,我们在培训期间还引入了一个对比学习模块和变异图像处理技术。 CGPN可以在0.5秒内找到使用单一视图深度图像的物体无碰撞捕捉取。对物理机器人进行实验,进一步证明算法的有效性。