We propose Scan2Part, a method to segment individual parts of objects in real-world, noisy indoor RGB-D scans. To this end, we vary the part hierarchies of objects in indoor scenes and explore their effect on scene understanding models. Specifically, we use a sparse U-Net-based architecture that captures the fine-scale detail of the underlying 3D scan geometry by leveraging a multi-scale feature hierarchy. In order to train our method, we introduce the Scan2Part dataset, which is the first large-scale collection providing detailed semantic labels at the part level in the real-world setting. In total, we provide 242,081 correspondences between 53,618 PartNet parts of 2,477 ShapeNet objects and 1,506 ScanNet scenes, at two spatial resolutions of 2 cm$^3$ and 5 cm$^3$. As output, we are able to predict fine-grained per-object part labels, even when the geometry is coarse or partially missing.
翻译:我们提议了Scan2Part, 一种在现实世界中分割物体个别部分的方法, 室内噪音 RGB- D 扫描。 为此, 我们从室内场景中改变物体的部分等级, 并探索其对现场理解模型的影响。 具体地说, 我们使用一个稀疏的 U- Net 结构, 利用一个多尺度特征等级来捕捉基底的 3D 扫描几何的细小细节。 为了培训我们的方法, 我们引入了 Scan2Part 数据集, 这是第一个大规模收集, 提供了在现实世界环境中部分层次上的详细语义标签 。 我们总共提供了242 081 对应函, 介于2 477 个 ShapeNet 对象的53 618 Part Net 部分和 1 506 scanNet 场景之间, 其空间分辨率为 2 cm 3 美元 和 5 cm 3 3 美元 美元。 作为输出, 我们能够预测精细的每个对象部分标签, 即便几何或部分缺失 。