用于RGB-D显要物体探测的双双双双双双光-非正交换相互互动网络 (Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection) - 专知论文

会员服务 ·

0

INTERACT · INFORMS · 模态 · RGB-D · Attention ·

2022 年 6 月 7 日

Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection

翻译：用于RGB-D显要物体探测的双双双双双双光-非正交换相互互动网络

Chao Zeng,Sam Kwong

Salient Object Detection is the task of predicting the human attended region in a given scene. Fusing depth information has been proven effective in this task. The main challenge of this problem is how to aggregate the complementary information from RGB modality and depth modality. However, conventional deep models heavily rely on CNN feature extractors, and the long-range contextual dependencies are usually ignored. In this work, we propose Dual Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs. Before fusing the two branches of features into one, attention-based modules are applied to enhance features from each modality. We design a self-attention-based cross-modality interaction module and a gated modality attention module to leverage the complementary information between the two modalities. For the saliency decoding, we create different stages enhanced with dense connections and keep a decoding memory while the multi-level encoding features are considered simultaneously. Considering the inaccurate depth map issue, we collect the RGB features of early stages into a skip convolution module to give more guidance from RGB modality to the final saliency prediction. In addition, we add edge supervision to regularize the feature learning process. Comprehensive experiments on five standard RGB-D SOD benchmark datasets over four evaluation metrics demonstrate the superiority of the proposed DTMINet method.

翻译：显性物体探测是预测特定场景中人类参与的区域的任务。显示深度信息已证明在这项任务中是有效的。这一问题的主要挑战是如何汇总来自 RGB 模式和深度模式的补充信息。但是,常规深度模型在很大程度上依赖CNN 地物提取器,而长距离背景依赖性通常被忽略。在这项工作中,我们提议双双双双向- 异向基于相互互动网络。我们采用双向- 异向转换模式作为RGB和深度模式的特征提取器,以模拟视觉输入中的长距离依赖性。在将两个功能分支分为一个, 应用基于关注的模块来增强每种模式的特征。我们设计了一个基于自觉的跨模式互动模块和一个封闭模式关注模块,以利用两种模式之间的互补信息。关于突出的分解,我们创建了不同的阶段,同时考虑多层次编码特征。我们收集了RGB的早期特征特征,然后将一个基于共变模式的组合模块应用来增强每种模式的特性。我们设计了一个基于RGB 标准性标准标准标准的RGB 测试模式,我们增加了对RGB 标准性基准的升级的RGB 模式,我们增加了对标准性基准的RGB 标准的升级的升级的升级的模型。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

核心蛋白聚糖（decorin）缺失的肿瘤微环境与结直肠癌发生和转移机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向RGB-D视频的人体动作识别研究

国家自然科学基金

0+阅读 · 2012年12月31日

心肌CTRP9基因转录调控机制及其在糖尿病缺血心肌易损性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于多传感信息的大尺度场景高维建模与表达

国家自然科学基金

0+阅读 · 2011年12月31日

NiCrW高温合金Ni2(Cr,W)超点阵结构相变机制与热稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂场景建模与超高清渲染技术

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

非定常流场自适应鲁棒降阶模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Arxiv

0+阅读 · 2022年7月22日

QueryProp: Object Query Propagation for High-Performance Video Object Detection

Arxiv

0+阅读 · 2022年7月22日

Pyramid Transformer for Traffic Sign Detection

Arxiv

0+阅读 · 2022年7月22日

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Arxiv

0+阅读 · 2022年7月21日

StreamYOLO: Real-time Object Detection for Streaming Perception

Arxiv

0+阅读 · 2022年7月21日

Temporal Saliency Query Network for Efficient Video Recognition

Arxiv

0+阅读 · 2022年7月21日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

Arxiv

21+阅读 · 2018年1月12日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Arxiv

0+阅读 · 2022年7月22日

QueryProp: Object Query Propagation for High-Performance Video Object Detection

Arxiv

0+阅读 · 2022年7月22日

Pyramid Transformer for Traffic Sign Detection

Arxiv

0+阅读 · 2022年7月22日

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Arxiv

0+阅读 · 2022年7月21日

StreamYOLO: Real-time Object Detection for Streaming Perception

Arxiv

0+阅读 · 2022年7月21日

Temporal Saliency Query Network for Efficient Video Recognition

Arxiv

0+阅读 · 2022年7月21日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

Arxiv

21+阅读 · 2018年1月12日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

核心蛋白聚糖（decorin）缺失的肿瘤微环境与结直肠癌发生和转移机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向RGB-D视频的人体动作识别研究

国家自然科学基金

0+阅读 · 2012年12月31日

心肌CTRP9基因转录调控机制及其在糖尿病缺血心肌易损性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于多传感信息的大尺度场景高维建模与表达

国家自然科学基金

0+阅读 · 2011年12月31日

NiCrW高温合金Ni2(Cr,W)超点阵结构相变机制与热稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂场景建模与超高清渲染技术

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

非定常流场自适应鲁棒降阶模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员