Scene depth information can help visual information for more accurate semantic segmentation. However, how to effectively integrate multi-modality information into representative features is still an open problem. Most of the existing work uses DCNNs to implicitly fuse multi-modality information. But as the network deepens, some critical distinguishing features may be lost, which reduces the segmentation performance. This work proposes a unified and efficient feature selectionand-fusion network (FSFNet), which contains a symmetric cross-modality residual fusion module used for explicit fusion of multi-modality information. Besides, the network includes a detailed feature propagation module, which is used to maintain low-level detailed information during the forward process of the network. Compared with the state-of-the-art methods, experimental evaluations demonstrate that the proposed model achieves competitive performance on two public datasets.
翻译:然而,如何有效地将多模式信息整合为具有代表性的特点仍然是一个尚未解决的问题。大多数现有工作都利用DCNN来隐含多模式信息。但是,随着网络的深化,可能会失去一些关键的区别特征,从而降低分化性能。这项工作提议建立一个统一有效的特征选择和聚合网络(FSFNet),其中包括一个对称跨模式剩余聚合模块,用于明确融合多模式信息。此外,该网络还包括一个详细的特征传播模块,用于在网络前方进程中维持低层次的详细信息。与最新的方法相比,实验性评估表明,拟议的模型在两个公共数据集上取得了竞争性的性能。