带有共同关注嵌入的编码器聚合网络,用于参考图像分割区段 (Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation)

Recently, referring image segmentation has aroused widespread interest. Previous methods perform the multi-modal fusion between language and vision at the decoding side of the network. And, linguistic feature interacts with visual feature of each scale separately, which ignores the continuous guidance of language to multi-scale visual features. In this work, we propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network, and uses language to refine the multi-modal features progressively. Moreover, a co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features, which can promote the consistent of the cross-modal information representation in the semantic space. Finally, we propose a boundary enhancement module (BEM) to make the network pay more attention to the fine structure. The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance under different evaluation metrics without any post-processing.

翻译：最近,参考图像分割引起了广泛的兴趣。以往的方法在网络解码的侧面使用语言和视觉之间的多模式融合。而且,语言特征与每个尺度的视觉特征相互作用,这忽略了语言对多尺度视觉特征的持续指导。在这项工作中,我们建议建立一个编码聚合网络(EFN),将视觉编码器转换成多模式特征学习网络,并使用语言逐步完善多模式特征。此外,在新新新市场中嵌入了一个共同关注机制,以实现多模式特征的平行更新,这可以促进语义空间跨模式信息代表的一致性。最后,我们提议了一个边界强化模块(BEM),以使网络更加关注精细的结构。四个基准数据集的实验结果显示,拟议方法在没有任何后处理的情况下,在不同评价指标下取得了最新业绩。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知会员服务

24+阅读 · 2020年3月31日