We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field optimization. A new dataset that contains instance-level annotation for both 3D urban scenes (roofs and buildings) and drone images (roofs) is provided. To the best of our knowledge, it is the first outdoor dataset dedicated to 3D instance segmentation with much more annotations of attached 3D buildings than existing datasets. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps and the advantages of our multi-view framework over the orthophoto-based method.
翻译:我们为城市场景的多视立体立体建筑提供了一个新的 3D 实例分解框架。 与现有的侧重于城市场景的语义分解的工程不同, 这项工作的重点在于检测和分解三维建筑实例, 即使这些实例被附着并嵌入一个大型和不精确的 3D 表面模型中。 多视 RGBH 图像首先通过添加一个高度图增强 RGBH 图像, 并进行分解以获得所有屋顶实例, 使用一个微调的 2D 实例分解神经网络。 然后将不同多视图像的隐形面罩集中到全球面具中。 我们的空间分解和重叠的掩码组合账户可以消除多视图像中的分解模糊之处。 基于这些全球面具, 3D 屋顶实例通过遮蔽反射反射的反射图像, 通过Markov 随机的场景优化将图像扩展至整个建筑。 一个新的数据集包含基于 3D (rofs) 和无人机图像(rof) 图像(rof) 的描述, 是我们所了解的最佳知识, 它显示的是第一个室中的主要数据结构图像化图, 而不是现有3D 格式结构结构图, 显示所有主要图谱图的图 。