CMF: CMF: CMF 参考图像分割的连锁多模型聚合 (CMF: Cascaded Multi-model Fusion for Referring Image Segmentation) - 专知论文

会员服务 ·

0

CMF · 级联 · 图像分割 · Branch · Processing（编程语言） ·

2021 年 6 月 16 日

CMF: Cascaded Multi-model Fusion for Referring Image Segmentation

翻译：CMF: CMF: CMF 参考图像分割的连锁多模型聚合

Jianhua Yang,Yan Huang,Zhanyu Ma,Liang Wang

from arxiv, Accepted by ICIP 2021

In this work, we address the task of referring image segmentation (RIS), which aims at predicting a segmentation mask for the object described by a natural language expression. Most existing methods focus on establishing unidirectional or directional relationships between visual and linguistic features to associate two modalities together, while the multi-scale context is ignored or insufficiently modeled. Multi-scale context is crucial to localize and segment those objects that have large scale variations during the multi-modal fusion process. To solve this problem, we propose a simple yet effective Cascaded Multi-modal Fusion (CMF) module, which stacks multiple atrous convolutional layers in parallel and further introduces a cascaded branch to fuse visual and linguistic features. The cascaded branch can progressively integrate multi-scale contextual information and facilitate the alignment of two modalities during the multi-modal fusion process. Experimental results on four benchmark datasets demonstrate that our method outperforms most state-of-the-art methods. Code is available at https://github.com/jianhua2022/CMF-Refseg.

翻译：在这项工作中,我们处理的是参考图像分割(RIS)的任务,其目的是预测自然语言表达表达所描述的物体的分离面罩,大多数现有方法侧重于在视觉和语言特征之间建立单向或方向关系,将两种模式结合起来,而多尺度背景则被忽视或没有进行充分的建模。多尺度环境对于在多模式融合过程中具有巨大规模差异的物体进行本地化和分割至关重要。为了解决这一问题,我们提议了一个简单而有效的封存式多模式融合模块,该模块平行地堆叠着多个振动层,并进一步将一个级联的分支用于融合视觉和语言特征。级联的分支可以逐步整合多尺度背景信息,并在多模式融合过程中促进两种模式的协调统一。四个基准数据集的实验结果表明,我们的方法超越了最先进的状态方法。代码见https://github.com/jianhua2022/CMF-Refseg。

0

相关内容

CMF

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

专知会员服务

41+阅读 · 2020年5月13日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

专知会员服务

20+阅读 · 2019年11月22日

【课程】浙大陈华钧教授《知识图谱导论》课程系列PPT

【课程】浙大陈华钧教授《知识图谱导论》课程系列PPT

专知会员服务

175+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

CVPR2019| 04-22更新19篇论文及代码（2篇oral，含物体检测、动作识别、医学影像等）

CVPR2019| 04-22更新19篇论文及代码（2篇oral，含物体检测、动作识别、医学影像等）

极市平台

13+阅读 · 2019年4月22日

商研丨实例分割的进阶三级跳：从Mask R-CNN到Hybrid Task Cascade

商研丨实例分割的进阶三级跳：从Mask R-CNN到Hybrid Task Cascade

商汤科技

3+阅读 · 2019年3月21日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

泡泡机器人SLAM

17+阅读 · 2018年10月12日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【消息】自动化所获计算机视觉大会ICCV2017场景解析竞赛冠军

【消息】自动化所获计算机视觉大会ICCV2017场景解析竞赛冠军

中国科学院自动化研究所

5+阅读 · 2017年11月6日

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Arxiv

1+阅读 · 2021年8月17日

Rethinking BiSeNet For Real-time Semantic Segmentation

Arxiv

7+阅读 · 2021年4月27日

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Arxiv

4+阅读 · 2019年7月4日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

A CNN-RNN Framework with a Novel Patch-Based Multi-Attention Mechanism for Multi-Label Image Classification in Remote Sensing

A CNN-RNN Framework with a Novel Patch-Based Multi-Attention Mechanism for Multi-Label Image Classification in Remote Sensing

Arxiv

3+阅读 · 2019年2月28日

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Arxiv

8+阅读 · 2019年2月11日

MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation

MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation

Arxiv

10+阅读 · 2018年12月4日

A Fully Convolutional Two-Stream Fusion Network for Interactive Image Segmentation

Arxiv

3+阅读 · 2018年10月2日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

Arxiv

4+阅读 · 2018年1月1日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

【CVPR2020-小鹏汽车】判别性多模态语音识别, Discriminative Multi-modality SR

专知会员服务

41+阅读 · 2020年5月13日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

专知会员服务

20+阅读 · 2019年11月22日

【课程】浙大陈华钧教授《知识图谱导论》课程系列PPT

【课程】浙大陈华钧教授《知识图谱导论》课程系列PPT

专知会员服务

175+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

CVPR2019| 04-22更新19篇论文及代码（2篇oral，含物体检测、动作识别、医学影像等）

CVPR2019| 04-22更新19篇论文及代码（2篇oral，含物体检测、动作识别、医学影像等）

极市平台

13+阅读 · 2019年4月22日

商研丨实例分割的进阶三级跳：从Mask R-CNN到Hybrid Task Cascade

商研丨实例分割的进阶三级跳：从Mask R-CNN到Hybrid Task Cascade

商汤科技

3+阅读 · 2019年3月21日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

【泡泡一分钟】SSD6D：基于RGB的三维检测和6自由度位姿估计(ICCV2017-159)

泡泡机器人SLAM

17+阅读 · 2018年10月12日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【消息】自动化所获计算机视觉大会ICCV2017场景解析竞赛冠军

【消息】自动化所获计算机视觉大会ICCV2017场景解析竞赛冠军

中国科学院自动化研究所

5+阅读 · 2017年11月6日

相关论文

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Arxiv

1+阅读 · 2021年8月17日

Rethinking BiSeNet For Real-time Semantic Segmentation

Arxiv

7+阅读 · 2021年4月27日

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Arxiv

4+阅读 · 2019年7月4日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

A CNN-RNN Framework with a Novel Patch-Based Multi-Attention Mechanism for Multi-Label Image Classification in Remote Sensing

A CNN-RNN Framework with a Novel Patch-Based Multi-Attention Mechanism for Multi-Label Image Classification in Remote Sensing

Arxiv

3+阅读 · 2019年2月28日

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Arxiv

8+阅读 · 2019年2月11日

MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation

MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation

Arxiv

10+阅读 · 2018年12月4日

A Fully Convolutional Two-Stream Fusion Network for Interactive Image Segmentation

Arxiv

3+阅读 · 2018年10月2日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

Arxiv

4+阅读 · 2018年1月1日

微信扫码咨询专知VIP会员