Grasping in cluttered scenes has always been a great challenge for robots, due to the requirement of the ability to well understand the scene and object information. Previous works usually assume that the geometry information of the objects is available, or utilize a step-wise, multi-stage strategy to predict the feasible 6-DoF grasp poses. In this work, we propose to formalize the 6-DoF grasp pose estimation as a simultaneous multi-task learning problem. In a unified framework, we jointly predict the feasible 6-DoF grasp poses, instance semantic segmentation, and collision information. The whole framework is jointly optimized and end-to-end differentiable. Our model is evaluated on large-scale benchmarks as well as the real robot system. On the public dataset, our method outperforms prior state-of-the-art methods by a large margin (+4.08 AP). We also demonstrate the implementation of our model on a real robotic platform and show that the robot can accurately grasp target objects in cluttered scenarios with a high success rate. Project link: https://openbyterobotics.github.io/sscl
翻译:由于需要能够充分了解现场和物体信息,因此,在片片段中进行筛选一直是机器人面临的巨大挑战。以往的工程通常假设有物体的几何信息可用,或者使用一步步、多阶段的战略来预测可行的 6-DoF 抓图。在这项工作中,我们提议将6-DoF 抓图作为同时出现的多任务学习问题,同时将估算作为同时出现的多任务学习问题。在一个统一的框架内,我们共同预测可行的 6-DoF 抓图、例语义分解和碰撞信息。整个框架是联合优化的,最终到终端是不同的。整个框架是联合优化的。我们的模型是按大型基准和真正的机器人系统进行评估的。在公共数据集上,我们的方法以大范围(+4.08 AP) 超前的艺术方法。我们还展示了我们模型在真正的机器人平台上的应用情况,并显示机器人能够以高成功率精确地捕捉目标物体。项目链接: https://openbytotictics.gio/sclsclcle