Current 3D segmentation methods heavily rely on large-scale point-cloud datasets, which are notoriously laborious to annotate. Few attempts have been made to circumvent the need for dense per-point annotations. In this work, we look at weakly-supervised 3D semantic instance segmentation. The key idea is to leverage 3D bounding box labels which are easier and faster to annotate. Indeed, we show that it is possible to train dense segmentation models using only bounding box labels. At the core of our method, \name{}, lies a deep model, inspired by classical Hough voting, that directly votes for bounding box parameters, and a clustering method specifically tailored to bounding box votes. This goes beyond commonly used center votes, which would not fully exploit the bounding box annotations. On ScanNet test, our weakly supervised model attains leading performance among other weakly supervised approaches (+18 mAP@50). Remarkably, it also achieves 97% of the mAP@50 score of current fully supervised models. To further illustrate the practicality of our work, we train Box2Mask on the recently released ARKitScenes dataset which is annotated with 3D bounding boxes only, and show, for the first time, compelling 3D instance segmentation masks.
翻译:目前的 3D 分解方法严重依赖大型的 点球分解数据集, 这在注释上非常困难。 很少有人试图绕过对密集的每点注解的需要。 在这项工作中, 我们查看的是低监管的 3D 语义区分解法。 关键的想法是利用3D 绑定框标签, 这些标签更容易和更快到注释上。 事实上, 我们显示, 仅使用捆绑框标签来训练密度密度的分解模型是可能的。 在我们的方法的核心, name ⁇ 是一个深层次的模型, 受经典的Hough投票启发, 直接投票决定捆绑框参数, 以及专门为捆绑框票设计的组合方法。 这超出了常用的中央票, 无法充分利用捆绑框说明。 在扫描网测试中, 我们薄弱的受监管模型在其它薄弱的监管方法( +18 mAP@50) 中取得了领先的性。 值得注意的是, 在当前完全监管模型的 mAP@50 分中, 也只有97% 的 mAP@50 分, 是一个深层次的模型。 为了进一步说明我们工作的实用性,, 我们训练了框框框框3MSK 展示了最近释放的框 3D 。