Perceiving and manipulating objects in a generalizable way has been actively studied by the computer vision and robotics communities, where cross-category generalizable manipulation skills are highly desired yet underexplored. In this work, we propose to learn such generalizable perception and manipulation via Generalizable and Actionable Parts (GAParts). By identifying and defining 9 GAPart classes (e.g. buttons, handles, etc), we show that our part-centric approach allows our method to learn object perception and manipulation skills from seen object categories and directly generalize to unseen categories. Following the GAPart definition, we construct a large-scale part-centric interactive dataset, GAPartNet, where rich, part-level annotations (semantics, poses) are provided for 1166 objects and 8489 part instances. Based on GAPartNet, we investigate three cross-category tasks: part segmentation, part pose estimation, and part-based object manipulation. Given the large domain gaps between seen and unseen object categories, we propose a strong 3D segmentation method from the perspective of domain generalization by integrating adversarial learning techniques. Our method outperforms all existing methods by a large margin, no matter on seen or unseen categories. Furthermore, with part segmentation and pose estimation results, we leverage the GAPart pose definition to design part-based manipulation heuristics that can generalize well to unseen object categories in both simulation and real world. The dataset and code will be released.
翻译:计算机视觉界和机器人界以普遍方式积极研究如何感知和操控物体,计算机视觉界和机器人界积极研究如何以普遍方式感知和操控物体,他们高度期望跨类通用操作技能,但探索不足。在这项工作中,我们提议通过通用和可操作部分(GAParts)来学习这种可普及的认知和操控。我们通过确定和定义9个GAPart类(如按钮、控柄等),表明我们的以部分为中心的方法使我们得以从可见对象类别中学习物体感知和操控技能,并直接将其推广到看不见类别。根据GAParts定义,我们构建了一个大型半中心互动数据集,其中为1166个对象和8489个部分提供了丰富的半级说明(mantis,配置)。我们根据GAPartNet,我们调查了三种跨类任务:部分分割、部分构成估计,以及部分基于部分的物体操纵。鉴于所见和看不见的物体类别之间存在巨大的域间差距,我们建议从域通用通用的3D分解法方法。我们的方法超越了现有的全域通用学习技术。我们的方法超越了全方位、部分定义、部分定义、部分是一般设计结果。我们所看到的一般分解的、一般分解。