Large-scale training data with high-quality annotations is critical for training semantic and instance segmentation models. Unfortunately, pixel-wise annotation is labor-intensive and costly, raising the demand for more efficient labeling strategies. In this work, we present a novel 3D-to-2D label transfer method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and instance labels from easy-to-obtain coarse 3D bounding primitives. Our method utilizes NeRF as a differentiable tool to unify coarse 3D annotations and 2D semantic cues transferred from existing datasets. We demonstrate that this combination allows for improved geometry guided by semantic information, enabling rendering of accurate semantic maps across multiple views. Furthermore, this fusion process resolves label ambiguity of the coarse 3D annotations and filters noise in the 2D predictions. By inferring in 3D space and rendering to 2D labels, our 2D semantic and instance labels are multi-view consistent by design. Experimental results show that Panoptic NeRF outperforms existing label transfer methods in terms of accuracy and multi-view consistency on challenging urban scenes of the KITTI-360 dataset.
翻译:具有高质量注释的大规模培训数据对于培训语义和例分解模型至关重要。 不幸的是,像素误读是一种劳动密集型且成本高昂的劳动性说明,提高了对更高效标签战略的需求。 在这项工作中,我们提出了一个新型的 3D至-2D 标签转移方法,即Panopt NeRF, 目的是从2D 预测中获取单像 2D 语义和实例标签的模糊性。我们的方法利用 NeRF 作为一种不同工具,将从现有数据集中传输的粗3D 说明和 2D 语义提示统一起来。我们证明,这种组合可以改进以语义信息为指导的几何测量,使准确的语义图能够跨越多种观点。此外,这种聚合过程解决了2D 预测中粗3D 说明和过滤器噪音的标签模糊性。通过在 3D 空间中推断并向 2D 标签提供。 我们的2D 语义和例标签是设计上一致的多视角。 我们的实验结果显示, Panphic Ner- NeRF 在多面图像中排除了现有数据传输方法的一致性。