BOAT: 双边的当地注意视觉变异器 (BOAT: Bilateral Local Attention Vision Transformer) - 专知论文

会员服务 ·

0

Vision · 注意力机制 · 变换 · Integration · Extensibility ·

2022 年 1 月 31 日

BOAT: Bilateral Local Attention Vision Transformer

翻译：BOAT: 双边的当地注意视觉变异器

Tan Yu,Gangming Zhao,Ping Li,Yizhou Yu

Vision Transformers achieved outstanding performance in many computer vision tasks. Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is computationally expensive when the number of patches is large. To improve efficiency, recent Vision Transformers adopt local self-attention mechanisms, where self-attention is computed within local windows. Despite the fact that window-based local self-attention significantly boosts efficiency, it fails to capture the relationships between distant but similar patches in the image plane. To overcome this limitation of image-space local attention, in this paper, we further exploit the locality of patches in the feature space. We group the patches into multiple clusters using their features, and self-attention is computed within every cluster. Such feature-space local attention effectively captures the connections between patches across different local windows but still relevant. We propose a Bilateral lOcal Attention vision Transformer (BOAT), which integrates feature-space local attention with image-space local attention. We further integrate BOAT with both Swin and CSWin models, and extensive experiments on several benchmark datasets demonstrate that our BOAT-CSWin model clearly and consistently outperforms existing state-of-the-art CNN models and vision Transformers.

翻译：视觉转换器在许多计算机视觉任务中取得了杰出的成绩。像ViT和DeiT这样的早期视觉变异器在很多补丁数量巨大时采用全球自省,计算成本昂贵。为了提高效率,最近的视觉变异器采用了本地自省机制,在本地窗口内进行自省计算。尽管基于窗口的本地自我注意极大地提高了效率,但它未能捕捉到图像平面上遥远但相似的补丁之间的关系。为了克服图像-空间地方关注的局限性,我们在本文中进一步利用地物空间的补丁点位置。我们利用它们的特征将补丁分成多个组,在每个组内计算自省。这些地物空间变异器有效地捕捉到不同地方窗口的补丁之间的联系,但仍然具有相关性。我们建议采用双边的液态注意变换器(BOAT),将地物空间的注意与图像-空间局部关注结合起来。我们进一步将BOAT与Swin和CSBIN模型结合起来,并在几个基准数据集上进行广泛的实验,表明我们BOAT-CSWISFAR的模型明确和持续超越现有状态。

1

相关内容

Vision

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

152+阅读 · 2021年10月25日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

PaperWeekly

1+阅读 · 2022年4月1日

Transformer不比CNN强！Local Attention和动态Depth-wise卷积的前世今生

Transformer不比CNN强！Local Attention和动态Depth-wise卷积的前世今生

极市平台

0+阅读 · 2022年3月31日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

图像处理：从 bilateral filter 到 HDRnet

图像处理：从 bilateral filter 到 HDRnet

极市平台

30+阅读 · 2019年8月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

熔石英损伤增长的复合波长效应研究

国家自然科学基金

0+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

电磁场对海马神经元TRP离子通道作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

ERG介导组蛋白修饰调控CRMP4失活启动前列腺癌转移的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

内质网应激在砷致神经细胞毒性中的作用机制及干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于电化学探针检测肿瘤细胞中的巯基物

国家自然科学基金

0+阅读 · 2011年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRPC和ORAI1协同构成钙池操纵的钙通道(SOC)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

从MR结合率及胞内第二信使cAMP/cGMP变化探讨野罂粟归大肠经的理论机制

国家自然科学基金

0+阅读 · 2008年12月31日

Learned Queries for Efficient Local Attention

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution

Arxiv

0+阅读 · 2022年4月19日

VSA: Learning Varied-Size Window Attention in Vision Transformers

VSA: Learning Varied-Size Window Attention in Vision Transformers

Arxiv

0+阅读 · 2022年4月18日

Dynamic Position Encoding for Transformers

Arxiv

1+阅读 · 2022年4月18日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

152+阅读 · 2021年10月25日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

相关资讯

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

PaperWeekly

1+阅读 · 2022年4月1日

Transformer不比CNN强！Local Attention和动态Depth-wise卷积的前世今生

Transformer不比CNN强！Local Attention和动态Depth-wise卷积的前世今生

极市平台

0+阅读 · 2022年3月31日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

图像处理：从 bilateral filter 到 HDRnet

图像处理：从 bilateral filter 到 HDRnet

极市平台

30+阅读 · 2019年8月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Learned Queries for Efficient Local Attention

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution

Arxiv

0+阅读 · 2022年4月19日

VSA: Learning Varied-Size Window Attention in Vision Transformers

VSA: Learning Varied-Size Window Attention in Vision Transformers

Arxiv

0+阅读 · 2022年4月18日

Dynamic Position Encoding for Transformers

Arxiv

1+阅读 · 2022年4月18日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

相关基金

熔石英损伤增长的复合波长效应研究

国家自然科学基金

0+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

电磁场对海马神经元TRP离子通道作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

ERG介导组蛋白修饰调控CRMP4失活启动前列腺癌转移的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

内质网应激在砷致神经细胞毒性中的作用机制及干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于电化学探针检测肿瘤细胞中的巯基物

国家自然科学基金

0+阅读 · 2011年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRPC和ORAI1协同构成钙池操纵的钙通道(SOC)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

从MR结合率及胞内第二信使cAMP/cGMP变化探讨野罂粟归大肠经的理论机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员