Currently, task-oriented grasp detection approaches are mostly based on pixel-level affordance detection and semantic segmentation. These pixel-level approaches heavily rely on the accuracy of a 2D affordance mask, and the generated grasp candidates are restricted to a small workspace. To mitigate these limitations, we first construct a novel affordance-based grasp dataset and propose a 6-DoF task-oriented grasp detection framework, which takes the observed object point cloud as input and predicts diverse 6-DoF grasp poses for different tasks. Specifically, our implicit estimation network and visual affordance network in this framework could directly predict coarse grasp candidates, and corresponding 3D affordance heatmap for each potential task, respectively. Furthermore, the grasping scores from coarse grasps are combined with heatmap values to generate more accurate and finer candidates. Our proposed framework shows significant improvements compared to baselines for existing and novel objects on our simulation dataset. Although our framework is trained based on the simulated objects and environment, the final generated grasp candidates can be accurately and stably executed in real robot experiments when the object is randomly placed on a support surface.
翻译:目前,以任务为导向的抓取探测方法主要基于像素级的发包检测和语义分解。 这些像素级的方法主要依赖于 2D 发包掩码的准确性, 生成的抓取对象仅限于一个小工作空间。 为了减轻这些限制, 我们首先构建了一个基于发包的抓取新数据集, 并提出一个6- DoF 任务导向的抓取检测框架, 该框架将观测到的物体点云作为输入, 并预测不同任务的不同 6- DoF 抓取 。 具体而言, 我们在此框架中的隐性估计网络和直观发包网络可以直接预测粗糙的抓取候选人, 和相应的 3D 3D 热映射 。 此外, 从粗略抓取的评分数与热映射值相结合, 以产生更准确和更精细的候选人。 我们的拟议框架显示与模拟数据集上现有和新对象的基线相比, 与我们的框架是建立在模拟对象和环境基础上的培训, 最终产生的抓取对象可以在真实的机器人实验中准确和直截地执行 。