Object instance segmentation is a key challenge for indoor robots navigating cluttered environments with many small objects. Limitations in 3D sensing capabilities often make it difficult to detect every possible object. While deep learning approaches may be effective for this problem, manually annotating 3D data for supervised learning is time-consuming. In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments. Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. SupeRGB-D outperforms existing baselines on unseen objects while achieving similar performance on seen objects. Additionally, it is extremely lightweight (0.4 MB memory requirement) and suitable for mobile and robotic applications. The dataset split and code will be made publicly available upon acceptance.
翻译:室内机器人在使用许多小天体的杂乱环境中航行的室内机器人,其实验对象的分解是一项关键的挑战。 3D遥感能力的限制往往使得难以探测每一种可能的物体。 虽然深层次的学习方法可能有效, 但用于监督学习的人工注解 3D 数据却耗时。 在这项工作中, 我们从 RGB- D 数据探索RGB- D 数据中的零光分解( ZSIS ), 以便用语义分类式分类式的方式识别看不见的物体。 我们引入了对桌面对象数据集( TOD- Z) 的零点分解, 以允许进行此项研究, 并展示一种方法, 使用附加注释的物体来学习“ 目标” 等离子的“ 对象”, 并概括到封闭的室内环境中的不可见对象类别。 我们的方法, SupeRGBB- D, 将像素分组分解成基于几何导线的小型补丁, 学会以深层的凝聚状方式将补补补。 SuperRGB- D 超越了现有未见物体的基线, 同时取得类似的性能性能码。 此外, 它将极轻度( 0.4 MB) 和可公开接受性数据和适合移动数据。