BOAT: 双边的当地注意视觉变异器 (BOAT: Bilateral Local Attention Vision Transformer) - 专知论文

会员服务 ·

0

Attention · Vision · 变换 · 簇 · Integration ·

2022 年 10 月 19 日

BOAT: Bilateral Local Attention Vision Transformer

翻译：BOAT: 双边的当地注意视觉变异器

Tan Yu,Gangming Zhao,Ping Li,Yizhou Yu

from arxiv, BMVC2022 oral

Vision Transformers achieved outstanding performance in many computer vision tasks. Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is computationally expensive when the number of patches is large. To improve efficiency, recent Vision Transformers adopt local self-attention mechanisms, where self-attention is computed within local windows. Despite the fact that window-based local self-attention significantly boosts efficiency, it fails to capture the relationships between distant but similar patches in the image plane. To overcome this limitation of image-space local attention, in this paper, we further exploit the locality of patches in the feature space. We group the patches into multiple clusters using their features, and self-attention is computed within every cluster. Such feature-space local attention effectively captures the connections between patches across different local windows but still relevant. We propose a Bilateral lOcal Attention vision Transformer (BOAT), which integrates feature-space local attention with image-space local attention. We further integrate BOAT with both Swin and CSWin models, and extensive experiments on several benchmark datasets demonstrate that our BOAT-CSWin model clearly and consistently outperforms existing state-of-the-art CNN models and vision Transformers.

翻译：视觉转换器在许多计算机视觉任务中取得了杰出的成绩。像ViT和DeiT这样的早期视觉变异器在很多补丁数量巨大时采用全球自省,计算成本昂贵。为了提高效率,最近的视觉变异器采用了本地自省机制,在本地窗口内进行自省计算。尽管基于窗口的本地自我注意极大地提高了效率,但它未能捕捉到图像平面上遥远但相似的补丁之间的关系。为了克服图像-空间地方关注的局限性,我们在本文中进一步利用地物空间的补丁点位置。我们利用它们的特征将补丁分成多个组,在每个组内计算自省。这些地物空间变异器有效地捕捉到不同地方窗口的补丁之间的联系,但仍然具有相关性。我们建议采用双边的液态注意变换器(BOAT),将地物空间的注意与图像-空间局部关注结合起来。我们进一步将BOAT与Swin和CSBIN模型结合起来,并在几个基准数据集上进行广泛的实验,表明我们BOAT-CSWISFAR的模型明确和持续超越现有状态。

0

相关内容

Attention

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

专知会员服务

44+阅读 · 2021年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

PaperWeekly

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

不可压缩Navier-Stokes方程解的性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于GSK3β及其相关自噬信号通路的槐定酸类新化合物IMB-08B抗肝癌作用机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

变分法与非线性微分方程

国家自然科学基金

0+阅读 · 2014年12月31日

Cr3+:CaMgSi2O6可调谐激光晶体材料的研究

国家自然科学基金

0+阅读 · 2013年12月31日

一个肺腺癌相关新lncRNA LOC100132354的生物学功能及其分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

C1型尼曼-匹克氏症轴突发育异常的病理机制

国家自然科学基金

0+阅读 · 2013年12月31日

p53负调控分子WIP1在脑缺血损伤中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Periostin蛋白在乳腺癌转移前微环境中的功能及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于偏微分方程和非局部方法的图像处理模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

Notch 信号通路在颞叶癫痫海马硬化形成中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Arxiv

0+阅读 · 2022年12月1日

Self-Supervised Feature Learning for Long-Term Metric Visual Localization

Arxiv

0+阅读 · 2022年11月30日

Multi-manifold Attention for Vision Transformers

Arxiv

0+阅读 · 2022年11月30日

VMFormer: End-to-End Video Matting with Transformer

Arxiv

0+阅读 · 2022年11月30日

Token-Label Alignment for Vision Transformers

Arxiv

0+阅读 · 2022年11月29日

Generalizable Industrial Visual Anomaly Detection with Self-Induction Vision Transformer

Arxiv

0+阅读 · 2022年11月29日

DMFormer: Closing the Gap Between CNN and Vision Transformers

Arxiv

0+阅读 · 2022年11月29日

A Close Look into the Calibration of Pre-trained Language Models

Arxiv

0+阅读 · 2022年11月28日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

专知会员服务

44+阅读 · 2021年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

ICLR 2022 | Transformer不比CNN强！Local Attention和动态Depth-wise卷积

PaperWeekly

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Arxiv

0+阅读 · 2022年12月1日

Self-Supervised Feature Learning for Long-Term Metric Visual Localization

Arxiv

0+阅读 · 2022年11月30日

Multi-manifold Attention for Vision Transformers

Arxiv

0+阅读 · 2022年11月30日

VMFormer: End-to-End Video Matting with Transformer

Arxiv

0+阅读 · 2022年11月30日

Token-Label Alignment for Vision Transformers

Arxiv

0+阅读 · 2022年11月29日

Generalizable Industrial Visual Anomaly Detection with Self-Induction Vision Transformer

Arxiv

0+阅读 · 2022年11月29日

DMFormer: Closing the Gap Between CNN and Vision Transformers

Arxiv

0+阅读 · 2022年11月29日

A Close Look into the Calibration of Pre-trained Language Models

Arxiv

0+阅读 · 2022年11月28日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

相关基金

不可压缩Navier-Stokes方程解的性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于GSK3β及其相关自噬信号通路的槐定酸类新化合物IMB-08B抗肝癌作用机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

变分法与非线性微分方程

国家自然科学基金

0+阅读 · 2014年12月31日

Cr3+:CaMgSi2O6可调谐激光晶体材料的研究

国家自然科学基金

0+阅读 · 2013年12月31日

一个肺腺癌相关新lncRNA LOC100132354的生物学功能及其分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

C1型尼曼-匹克氏症轴突发育异常的病理机制

国家自然科学基金

0+阅读 · 2013年12月31日

p53负调控分子WIP1在脑缺血损伤中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Periostin蛋白在乳腺癌转移前微环境中的功能及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于偏微分方程和非局部方法的图像处理模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

Notch 信号通路在颞叶癫痫海马硬化形成中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员