We propose an approach to multi-modal grasp detection that jointly predicts the probabilities that several types of grasps succeed at a given grasp pose. Given a partial point cloud of a scene, the algorithm proposes a set of feasible grasp candidates, then estimates the probabilities that a grasp of each type would succeed at each candidate pose. Predicting grasp success probabilities directly from point clouds makes our approach agnostic to the number and placement of depth sensors at execution time. We evaluate our system both in simulation and on a real robot with a Robotiq 3-Finger Adaptive Gripper and compare our network against several baselines that perform fewer types of grasps. Our experiments show that a system that explicitly models grasp type achieves an object retrieval rate 8.5% higher in a complex cluttered environment than our highest-performing baseline.
翻译:我们建议一种多模式把握探测方法, 共同预测数种捕捉在特定捕捉时成功的可能性。 在片段云层中, 算法提出一组可行的捕捉对象, 然后估计每一种捕捉在每种候选者身上成功的可能性。 从点云层中直接预测成功概率, 使我们的方法不可想象到深度感应器的数量和在执行时的位置。 我们通过模拟和用机器人3-Finger适应性Gripper来评估我们的系统, 并比较我们的网络与几个较少捕捉种类的基线。 我们的实验显示, 明确捕捉类型系统在复杂的断层环境中比我们最优秀的基线高8.5%的目标检索率。