愿景变换者中合金 (Token Pooling in Vision Transformers) - 专知论文

会员服务 ·

0

词元分析器 · Vision · 汇聚 · CC · 模型评估 ·

2021 年 10 月 11 日

Token Pooling in Vision Transformers

翻译：愿景变换者中合金

Dmitrii Marin,Jen-Hao Rick Chang,Anurag Ranjan,Anish Prabhu,Mohammad Rastegari,Oncel Tuzel

Despite the recent success in many applications, the high computational requirements of vision transformers limit their use in resource-constrained settings. While many existing methods improve the quadratic complexity of attention, in most vision transformers, self-attention is not the major computation bottleneck, e.g., more than 80% of the computation is spent on fully-connected layers. To improve the computational complexity of all layers, we propose a novel token downsampling method, called Token Pooling, efficiently exploiting redundancies in the images and intermediate token representations. We show that, under mild assumptions, softmax-attention acts as a high-dimensional low-pass (smoothing) filter. Thus, its output contains redundancy that can be pruned to achieve a better trade-off between the computational cost and accuracy. Our new technique accurately approximates a set of tokens by minimizing the reconstruction error caused by downsampling. We solve this optimization problem via cost-efficient clustering. We rigorously analyze and compare to prior downsampling methods. Our experiments show that Token Pooling significantly improves the cost-accuracy trade-off over the state-of-the-art downsampling. Token Pooling is a simple and effective operator that can benefit many architectures. Applied to DeiT, it achieves the same ImageNet top-1 accuracy using 42% fewer computations.

翻译：尽管最近在许多应用程序中取得了成功,但视觉变压器的高计算要求限制了其在资源限制环境中的使用。虽然许多现有方法改善了注意力的四重复杂度,但在大多数视觉变压器中,自我注意并不是主要的计算瓶颈,例如,80%以上的计算在完全连接的层上花费。为了改进所有层的计算复杂性,我们提议了一种新型的象征性降压方法,称为“托肯集合”,有效地利用图像和中间代号表达式中的冗余。我们显示,在轻度假设下,软式注意作为高维低通路(moooating)过滤器。因此,其输出含有冗余,可以调整以在计算成本和准确性之间实现更好的交换。我们的新技术通过尽量减少降压造成的重建错误,准确地接近了一套象征性。我们通过高成本效益的组合解决了这个优化问题。我们严格地分析和比较以前的降压方法。我们的实验显示,托肯合并大大改进了42度低端的低端网络(moos)过滤器,从而可以实现成本-准确性最高计算结构的效益,从而实现了一种超状态。

0

相关内容

词元分析器

词元分析器

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICML2021】PoolingFormer：具有池化注意力机制的长序列输入模型

专知会员服务

35+阅读 · 2021年7月25日

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

已删除

将门创投

3+阅读 · 2019年9月4日

Make A Long Image Short: Adaptive Token Length for Vision Transformers

Arxiv

0+阅读 · 2021年12月3日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Transformer in Transformer

Arxiv

11+阅读 · 2021年10月26日

Parameter Prediction for Unseen Deep Architectures

Arxiv

6+阅读 · 2021年10月25日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICML2021】PoolingFormer：具有池化注意力机制的长序列输入模型

专知会员服务

35+阅读 · 2021年7月25日

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

已删除

将门创投

3+阅读 · 2019年9月4日

相关论文

Make A Long Image Short: Adaptive Token Length for Vision Transformers

Arxiv

0+阅读 · 2021年12月3日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Transformer in Transformer

Arxiv

11+阅读 · 2021年10月26日

Parameter Prediction for Unseen Deep Architectures

Arxiv

6+阅读 · 2021年10月25日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

13+阅读 · 2019年6月13日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

微信扫码咨询专知VIP会员