Grasping arbitrary objects in densely cluttered novel environments is a crucial skill for robots. Though many existing systems enable two-finger parallel-jaw grippers to pick items from clutter, these grippers cannot perform multiple types of grasps. However, multi-modal grasping with multi-finger grippers could much more effectively clear objects of varying sizes from cluttered scenes. We propose an approach to multi-model grasp detection that jointly predicts the probabilities that several types of grasps succeed at a given grasp pose. Given a partial point cloud of a scene, the algorithm proposes a set of feasible grasp candidates, then estimates the probabilities that a grasp of each type would succeed at each candidate pose. Predicting grasp success probabilities directly from point clouds makes our approach agnostic to the number and placement of depth sensors at execution time. We evaluate our system both in simulation and on a real robot with a Robotiq 3-Finger Adaptive Gripper. We compare our network against several baselines that perform fewer types of grasps. Our experiments show that a system that explicitly models grasp type achieves an object retrieval rate 8.5% higher in a complex cluttered environment than our highest-performing baseline.
翻译:尽管许多现有系统使双指平行的平行抓抓器能够将物品从杂草中取出,但这些抓抓手无法进行多种类型的抓抓。然而,多指抓手的多式抓抓方法可以更有效地清除从杂乱的场景中不同大小的不同物体。我们建议采用多模型抓取探测方法,共同预测几类握手在特定握手姿势上成功的概率。考虑到片点云层,算法提出一套可行的套套用人选,然后估计每一种抓手在每一个候选姿势上取得成功的概率。预测从点云层直接掌握成功概率,使我们的方法对执行时深度传感器的数量和位置具有不可知性。我们用模拟和机器人3-Finger适应性Gripper对一个真正的机器人进行评估。我们比较了我们的网络与几个较不那么高的基线进行比较。我们的实验显示,一个系统在复杂的对象率方面显然掌握了一种最精确的基线率,而不是在最复杂的对象率中进行检索。