仔细观察空间建模:从关注到革命 (A Close Look at Spatial Modeling: From Attention to Convolution) - 专知论文

会员服务 ·

0

Attention · Vision · MoDELS · 卷积 · 变换 ·

2022 年 12 月 23 日

A Close Look at Spatial Modeling: From Attention to Convolution

翻译：仔细观察空间建模:从关注到革命

Xu Ma,Huan Wang,Can Qin,Kunpeng Li,Xingchen Zhao,Jie Fu,Yun Fu

Vision Transformers have shown great promise recently for many vision tasks due to the insightful architecture design and attention mechanism. By revisiting the self-attention responses in Transformers, we empirically observe two interesting issues. First, Vision Transformers present a queryirrelevant behavior at deep layers, where the attention maps exhibit nearly consistent contexts in global scope, regardless of the query patch position (also head-irrelevant). Second, the attention maps are intrinsically sparse, few tokens dominate the attention weights; introducing the knowledge from ConvNets would largely smooth the attention and enhance the performance. Motivated by above observations, we generalize self-attention formulation to abstract a queryirrelevant global context directly and further integrate the global context into convolutions. The resulting model, a Fully Convolutional Vision Transformer (i.e., FCViT), purely consists of convolutional layers and firmly inherits the merits of both attention mechanism and convolutions, including dynamic property, weight sharing, and short- and long-range feature modeling, etc. Experimental results demonstrate the effectiveness of FCViT. With less than 14M parameters, our FCViT-S12 outperforms related work ResT-Lite by 3.7% top1 accuracy on ImageNet-1K. When scaling FCViT to larger models, we still perform better than previous state-of-the-art ConvNeXt with even fewer parameters. FCViT-based models also demonstrate promising transferability to downstream tasks, like object detection, instance segmentation, and semantic segmentation. Codes and models are made available at: https://github.com/ma-xu/FCViT.

翻译：由于有见地的架构设计和关注机制,视觉变异器最近对许多视觉任务表现出巨大的希望。我们从经验上观察了两个有趣的问题。首先,视觉变异器在深层展示了一种与感知相关的行为,在深层中,关注映射显示了全球范围几乎一致的背景,而不管交汇处的位置(也与头部无关 ) 。第二,关注图在本质上是稀少的,很少有符号主宰着注意力的权重;引入来自ConvNet的知识将在很大程度上平滑关注和增强性能。受以上观察的激励,我们将自我注意的配方转化为与感知相关的全球背景,并将全球背景进一步融入到演化中。由此产生的模型,即全动视野变异器(即FCVIT),纯粹由演化层层和牢固继承关注机制和演变动的优点,包括动态属性、权重共享、短程和长程地谱模型模型,等等。实验结果显示FCViT的效益。在低于14M的参数下,我们的FVIT-S-VIS-S-S-ILS-real slaft sal sal sal sal sal sal sal sal sal silation silation silation sylate sylations sess real sess reduction sylated.

0

相关内容

Attention

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

Triptolide诱导c-FLIP选择性剪切在调控TRAIL耐药胰腺癌细胞凋亡中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

地震波作用下非饱和土中PHC管桩水平振动机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

SIRT1在前列腺癌发生发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Kriging模型的叶盘系统多场耦合动力学多学科设计优化理论与试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

磷脂爬行酶1抑制HBV复制的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PSCA对前列腺癌细胞自分泌IL-6的调控作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

CX3CL1/CX3CR1相互作用调控低氧前列腺癌细胞转移的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

前列腺癌组织特异microRNA表达谱研究

国家自然科学基金

0+阅读 · 2008年12月31日

RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Arxiv

0+阅读 · 2023年2月23日

The alignment problem from a deep learning perspective

Arxiv

0+阅读 · 2023年2月22日

KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer

Arxiv

0+阅读 · 2023年2月22日

Using Semantic Information for Defining and Detecting OOD Inputs

Arxiv

0+阅读 · 2023年2月21日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Pay Attention to MLPs

Arxiv

28+阅读 · 2021年5月17日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

12+阅读 · 2019年2月19日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

Arxiv

0+阅读 · 2023年2月23日

The alignment problem from a deep learning perspective

Arxiv

0+阅读 · 2023年2月22日

KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer

Arxiv

0+阅读 · 2023年2月22日

Using Semantic Information for Defining and Detecting OOD Inputs

Arxiv

0+阅读 · 2023年2月21日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Pay Attention to MLPs

Arxiv

28+阅读 · 2021年5月17日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

12+阅读 · 2019年2月19日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

Triptolide诱导c-FLIP选择性剪切在调控TRAIL耐药胰腺癌细胞凋亡中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

地震波作用下非饱和土中PHC管桩水平振动机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

SIRT1在前列腺癌发生发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Kriging模型的叶盘系统多场耦合动力学多学科设计优化理论与试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

磷脂爬行酶1抑制HBV复制的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PSCA对前列腺癌细胞自分泌IL-6的调控作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

CX3CL1/CX3CR1相互作用调控低氧前列腺癌细胞转移的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

前列腺癌组织特异microRNA表达谱研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员