Shape informs how an object should be grasped, both in terms of where and how. As such, this paper describes a segmentation-based architecture for decomposing objects sensed with a depth camera into multiple primitive shapes, along with a post-processing pipeline for robotic grasping. Segmentation employs a deep network, called PS-CNN, trained on synthetic data with 6 classes of primitive shapes and generated using a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candidates per shape region. The grasps are rank ordered, with the first feasible one chosen for execution. For task-free grasping of individual objects, the method achieves a 94.2% success rate placing it amongst the top performing grasp methods when compared to top-down and SE(3)-based approaches. Additional tests involving variable viewpoints and clutter demonstrate robustness to setup. For task-oriented grasping, PS-CNN achieves a 93.0% success rate. Overall, the outcomes support the hypothesis that explicitly encoding shape primitives within a grasping pipeline should boost grasping performance, including task-free and task-relevant grasp prediction.
翻译:形状 说明一个对象应该如何被掌握, 包括在哪里和如何。 因此, 本文描述了一个基于分解结构的构造, 分解物体, 用深相照相机将物体分解成多个原始形状, 以及一个后处理管道, 用于机器人捕捉。 分解使用一个深网络, 称为 PS- CNN, 其培训为6类原始形状的合成数据, 并使用模拟引擎生成。 每个原始形状都是由配有配方的抓住家庭设计, 允许管道在形状区域中识别多个掌握的候选者。 分级是订购的, 第一个选择用于执行的则是可行的分级 。 对于单个对象, 与自上至下和SE(3) 方法相比, 该方法取得了94.2%的成功率, 将它置于顶端的掌握方法之间。 涉及变量观点和结晶体的更多测试显示要设置的稳健性。 对于任务掌握, PS- CNN 达到93.0%的成功率。 总体而言, 结果支持一种假设, 明确编码在掌握的管道中塑造原始件应该提高性,,, 包括无任务和与任务有关任务相关的掌握的预测。