3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.
翻译:3D点云层在几何结构信息方面丰富, 而 2D 图像包含重要且连续的纹理信息 。 将 2D 信息合并, 以更好地实现 3D 语义分解, 已经成为 3D 场景理解的主流 。 尽管成功, 但仍难以整合和处理这两个不同空间的跨维特性 。 现有最先进的双向投影方法通常会利用双向投影方法来匹配跨维空间特性, 并实现 2D 和 3D 语义分解任务 。 然而, 为了能够进行双向直流的直线直线直线直线绘图, 这个框架往往需要一个对称 2D-3 D 网络结构, 从而限制网络的灵活性 。 与此同时, 这种双轨设置可能会轻易地分散网络, 并导致三D 分解特性的超。 由于网络的不灵活性能, 连线性能只能通过一个解码网络, 影响模型的性能。 在本文中, 为了减轻这些反向的图理学, 我们认为, 直向双向双向多D 直径直路路路路路路路路路路的 将 3, 将一个更好的直路路路路路路段 。 在3D 将3 直路基 直路段 将3 直路基 直路行将3D 3 直路行将3 直路行将3 。