Depth information matters in RGB-D semantic segmentation task for providing additional geometric information to color images. Most existing methods exploit a multi-stage fusion strategy to propagate depth feature to the RGB branch. However, at the very deep stage, the propagation in a simple element-wise addition manner can not fully utilize the depth information. We propose Global-Local propagation network (GLPNet) to solve this problem. Specifically, a local context fusion module(L-CFM) is introduced to dynamically align both modalities before element-wise fusion, and a global context fusion module(G-CFM) is introduced to propagate the depth information to the RGB branch by jointly modeling the multi-modal global context features. Extensive experiments demonstrate the effectiveness and complementarity of the proposed fusion modules. Embedding two fusion modules into a two-stream encoder-decoder structure, our GLPNet achieves new state-of-the-art performance on two challenging indoor scene segmentation datasets, i.e., NYU-Depth v2 and SUN-RGBD dataset.
翻译:在 RGB-D 语义分割任务中的深度信息事项中,为颜色图像提供额外几何信息。大多数现有方法都利用多阶段融合战略向 RGB 分支传播深度特征。 但是,在非常深的阶段,以简单元素-添加方式传播无法充分利用深度信息。 我们提议将全球-本地传播网络(GLPNet)嵌入二流编码器-解码器结构中。 具体地说,引入了本地背景融合模块(L-CFM),在元素融合之前动态地对两种模式进行统一,引入了全球背景融合模块(G-CFM),通过联合模拟多模式全球背景特征向 RGB 分支传播深度信息。 广泛实验显示了拟议聚合模块的有效性和互补性。 将两个融合模块嵌入双流编码器-解码器结构中,我们的GLPNet在两个具有挑战性的室内分解数据集(即NYU-DTV2 和 SUN-RGBD)上实现了新的状态-艺术性表现。