DlateFormer: 用于视觉识别的多级闭合变异器 (DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition) - 专知论文

会员服务 ·

0

INTERACT · 变换 · MoDELS · Vision · state-of-the-art ·

2023 年 2 月 3 日

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

翻译：DlateFormer: 用于视觉识别的多级闭合变异器

Jiayu Jiao,Yu-Ming Tang,Kun-Yu Lin,Yipeng Gao,Jinhua Ma,Yaowei Wang,Wei-Shi Zheng

from arxiv, Accepted to IEEE Transaction on Multimedia, 2023 (Submission date: 22-Sep-2022)

As a de facto solution, the vanilla Vision Transformers (ViTs) are encouraged to model long-range dependencies between arbitrary image patches while the global attended receptive field leads to quadratic computational cost. Another branch of Vision Transformers exploits local attention inspired by CNNs, which only models the interactions between patches in small neighborhoods. Although such a solution reduces the computational cost, it naturally suffers from small attended receptive fields, which may limit the performance. In this work, we explore effective Vision Transformers to pursue a preferable trade-off between the computational complexity and size of the attended receptive field. By analyzing the patch interaction of global attention in ViTs, we observe two key properties in the shallow layers, namely locality and sparsity, indicating the redundancy of global dependency modeling in shallow layers of ViTs. Accordingly, we propose Multi-Scale Dilated Attention (MSDA) to model local and sparse patch interaction within the sliding window. With a pyramid architecture, we construct a Multi-Scale Dilated Transformer (DilateFormer) by stacking MSDA blocks at low-level stages and global multi-head self-attention blocks at high-level stages. Our experiment results show that our DilateFormer achieves state-of-the-art performance on various vision tasks. On ImageNet-1K classification task, DilateFormer achieves comparable performance with 70% fewer FLOPs compared with existing state-of-the-art models. Our DilateFormer-Base achieves 85.6% top-1 accuracy on ImageNet-1K classification task, 53.5% box mAP/46.1% mask mAP on COCO object detection/instance segmentation task and 51.1% MS mIoU on ADE20K semantic segmentation task.

翻译：作为事实上的解决方案,鼓励香草愿景变异器(VVITs)建模任意图像补丁之间的长距离依赖性,而全球参与的可接受字段则导致二次计算成本。另一个视野变异器分支利用了受CNN启发的当地关注,而CNN只是模拟小邻居补丁之间的互动。虽然这种解决方案可以降低计算成本,但自然会受到小规模的可接受字段的影响,这可能会限制性能。在这项工作中,我们探索有效的U愿景变异器,以便在所接受的可接收字段的计算复杂性和大小之间实现更佳的权衡。通过分析VITs全球关注的补差互动,我们观察到了浅层的两个关键属性,即地点和偏差,这表明全球依赖性建模模式在微小邻居之间的相互作用。因此,我们建议多层次的调异调关注(MCDD)在滑动窗口内建模本地和稀薄的补丁。我们用多级的图像变异模型(DlFDRFA),在低级的 OFAFADLS,在高层次上建了一个多级的自动任务。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

EADIA调节抑癌基因DCC凋亡通路的分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

自负载自活化烯烃聚合催化剂的研究

国家自然科学基金

0+阅读 · 2013年12月31日

DFT+Gutzwiller方法研究过渡金属氧化物

国家自然科学基金

0+阅读 · 2012年12月31日

运动和高温调控胰岛素抵抗大鼠Irisin代谢通路的研究

国家自然科学基金

0+阅读 · 2012年12月31日

肌酸激酶与CC2D1A和NF-κB相互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高迁移率族蛋白-1在急性脊髓损伤后肺损伤中作用及其分子机制的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

稠密气固两相流动与反应耦合机理及反应性颗粒动理学研究

国家自然科学基金

0+阅读 · 2011年12月31日

探寻与高功能孤独症和Asperger综合征相关的拷贝数变异

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

Arxiv

0+阅读 · 2023年3月23日

POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

Arxiv

0+阅读 · 2023年3月23日

Text with Knowledge Graph Augmented Transformer for Video Captioning

Arxiv

0+阅读 · 2023年3月22日

Multiscale Attention via Wavelet Neural Operators for Vision Transformers

Arxiv

0+阅读 · 2023年3月22日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

Arxiv

0+阅读 · 2023年3月23日

POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

Arxiv

0+阅读 · 2023年3月23日

Text with Knowledge Graph Augmented Transformer for Video Captioning

Arxiv

0+阅读 · 2023年3月22日

Multiscale Attention via Wavelet Neural Operators for Vision Transformers

Arxiv

0+阅读 · 2023年3月22日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks

Arxiv

14+阅读 · 2019年3月5日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

EADIA调节抑癌基因DCC凋亡通路的分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

自负载自活化烯烃聚合催化剂的研究

国家自然科学基金

0+阅读 · 2013年12月31日

DFT+Gutzwiller方法研究过渡金属氧化物

国家自然科学基金

0+阅读 · 2012年12月31日

运动和高温调控胰岛素抵抗大鼠Irisin代谢通路的研究

国家自然科学基金

0+阅读 · 2012年12月31日

肌酸激酶与CC2D1A和NF-κB相互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高迁移率族蛋白-1在急性脊髓损伤后肺损伤中作用及其分子机制的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

稠密气固两相流动与反应耦合机理及反应性颗粒动理学研究

国家自然科学基金

0+阅读 · 2011年12月31日

探寻与高功能孤独症和Asperger综合征相关的拷贝数变异

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员