Robotic grasping is a fundamental ability for a robot to interact with the environment. Current methods focus on how to obtain a stable and reliable grasping pose in object wise, while little work has been studied on part (shape)-wise grasping which is related to fine-grained grasping and robotic affordance. Parts can be seen as atomic elements to compose an object, which contains rich semantic knowledge and a strong correlation with affordance. However, lacking a large part-wise 3D robotic dataset limits the development of part representation learning and downstream application. In this paper, we propose a new large Language-guided SHape grAsPing datasEt (named Lang-SHAPE) to learn 3D part-wise affordance and grasping ability. We design a novel two-stage fine-grained robotic grasping network (named PIONEER), including a novel 3D part language grounding model, and a part-aware grasp pose detection model. To evaluate the effectiveness, we perform multi-level difficulty part language grounding grasping experiments and deploy our proposed model on a real robot. Results show our method achieves satisfactory performance and efficiency in reference identification, affordance inference, and 3D part-aware grasping. Our dataset and code are available on our project website https://sites.google.com/view/lang-shape
翻译:机器人掌握机器人是机器人与环境互动的基本能力。 目前的方法侧重于如何在目标智慧下获得稳定可靠的掌握姿势, 而对于部分( 形状) 与精细捕捉和机器人支付能力相关的部分( 形状) 则很少研究与精细捕捉和机器人支付能力相关的部分( 形状) 。 部件可以被视为原子元素, 组成一个物体, 它包含丰富的语义知识, 并且与价格有很强的关联。 但是, 缺少一个大半智能的 3D 机器人数据集限制了部分代表学习和下游应用程序的开发。 在本文中, 我们提议了一个新的大型语言制导 Sahape glasPing 数据Et( 名为 Lang- SHAPEP), 以学习三维的局部负担和掌握能力。 我们设计了一个两级精细的机器人捕捉网络( 名为 PIONEER ), 包括一个新的 3D 部分语言定位定位模型, 以及一个部分觉察到的图像模型模型模型。 为了评估效果, 我们执行多层次的语言定位实验和在真正的机器人上部署我们提议的模型。 结果展示了我们的方法, 3号 和可以理解我们的数据 。