Sparsifiner：学习稀疏实例依赖的注意力以实现高效视觉Transformer (Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers) - 专知论文

会员服务 ·

0

令牌 · 稀疏 · 连接性 · 变换 · 结构化 ·

2023 年 3 月 24 日

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

翻译：Sparsifiner：学习稀疏实例依赖的注意力以实现高效视觉Transformer

Cong Wei,Brendan Duke,Ruowei Jiang,Parham Aarabi,Graham W. Taylor,Florian Shkurti

from arxiv, Accepted at CVPR 2023

Vision Transformers (ViT) have shown their competitive advantages performance-wise compared to convolutional neural networks (CNNs) though they often come with high computational costs. To this end, previous methods explore different attention patterns by limiting a fixed number of spatially nearby tokens to accelerate the ViT's multi-head self-attention (MHSA) operations. However, such structured attention patterns limit the token-to-token connections to their spatial relevance, which disregards learned semantic connections from a full attention mask. In this work, we propose a novel approach to learn instance-dependent attention patterns, by devising a lightweight connectivity predictor module to estimate the connectivity score of each pair of tokens. Intuitively, two tokens have high connectivity scores if the features are considered relevant either spatially or semantically. As each token only attends to a small number of other tokens, the binarized connectivity masks are often very sparse by nature and therefore provide the opportunity to accelerate the network via sparse computations. Equipped with the learned unstructured attention pattern, sparse attention ViT (Sparsifiner) produces a superior Pareto-optimal trade-off between FLOPs and top-1 accuracy on ImageNet compared to token sparsity. Our method reduces 48% to 69% FLOPs of MHSA while the accuracy drop is within 0.4%. We also show that combining attention and token sparsity reduces ViT FLOPs by over 60%.

翻译：视觉Transformer（ViT）表现出了它们在性能方面与卷积神经网络（CNN）相比的竞争优势，但往往具有高计算成本。为此，先前的方法通过限制一定数量的空间相邻令牌来探索不同的注意力模式，以加速ViT的多头自注意（MHSA）操作。然而，这些结构化的注意力模式将令牌到令牌的连接限制在它们的空间相关性中，忽视了从完整的注意力蒙版中学习的语义连接。在这项工作中，我们提出了一种学习实例依赖的注意力模式的新方法，通过设计一个轻量级的连接性预测模块来估计每对令牌的连接性得分。直观地说，如果认为特征在空间或语义上相关，则两个标记具有较高的连接得分。由于每个标记仅与少量其他标记相关，因此二值化的连接掩码通常具有很高的稀疏性，并因此提供了通过稀疏计算加速网络的机会。配备了学习的非结构化注意力模式，稀疏注意ViT（Sparsifiner）在ImageNet上的FLOPs和top-1准确性之间产生了优越的 Pareto 最优折衷。与token稀疏相比，我们的方法减少了MHSA的48％到69％的FLOPs，而准确度下降在0.4％以内。我们还表明，注意力和标记稀疏的结合将ViT的FLOPs降低了60％以上。

0

相关内容

【ICML2022】基于自适应上下文池化的高效表示学习

【ICML2022】基于自适应上下文池化的高效表示学习

专知会员服务

20+阅读 · 2022年7月9日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

专知会员服务

25+阅读 · 2022年3月9日

【CVPR2022】三元组对比学习的视觉-语言预训练

【CVPR2022】三元组对比学习的视觉-语言预训练

专知会员服务

33+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

近期必读的5篇顶会ICCV 2021【语义分割】相关论文和代码

专知会员服务

43+阅读 · 2021年8月20日

近期必读的五篇计算机视觉顶会CVPR 2020【图神经网络 (GNN) 】相关论文-Part 3

近期必读的五篇计算机视觉顶会CVPR 2020【图神经网络 (GNN) 】相关论文-Part 3

专知会员服务

90+阅读 · 2020年5月19日

近期必读的5篇AI顶会CVPR 2020 GNN (图神经网络) 相关论文

近期必读的5篇AI顶会CVPR 2020 GNN (图神经网络) 相关论文

专知会员服务

79+阅读 · 2020年3月3日

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

专知会员服务

20+阅读 · 2020年1月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

机器之心

1+阅读 · 2022年11月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

面向无线多媒体传感器网络的高效压缩视频感知

国家自然科学基金

0+阅读 · 2015年12月31日

面向多类图像分类的众包主动学习方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

内蕴特征空间基于低秩和稀疏分析的医学图像集处理方法

国家自然科学基金

1+阅读 · 2013年12月31日

Web图像视觉模式挖掘及其应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于数据融合的大规模无线传感器网络的时空覆盖研究

国家自然科学基金

0+阅读 · 2012年12月31日

NiMnInCo磁热合金的绝热温变研究

国家自然科学基金

0+阅读 · 2012年12月31日

模拟视觉信息处理机制的视频对象行为识别

国家自然科学基金

0+阅读 · 2009年12月31日

基于邻域可视性相关的最优路径问题研究

国家自然科学基金

0+阅读 · 2008年12月31日

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

Arxiv

0+阅读 · 2023年5月16日

Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering

Arxiv

0+阅读 · 2023年5月12日

Quaternion-valued Correlation Learning for Few-Shot Semantic Segmentation

Arxiv

0+阅读 · 2023年5月12日

Salient Mask-Guided Vision Transformer for Fine-Grained Classification

Arxiv

0+阅读 · 2023年5月11日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2022】基于自适应上下文池化的高效表示学习

【ICML2022】基于自适应上下文池化的高效表示学习

专知会员服务

20+阅读 · 2022年7月9日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

专知会员服务

25+阅读 · 2022年3月9日

【CVPR2022】三元组对比学习的视觉-语言预训练

【CVPR2022】三元组对比学习的视觉-语言预训练

专知会员服务

33+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

近期必读的5篇顶会ICCV 2021【语义分割】相关论文和代码

专知会员服务

43+阅读 · 2021年8月20日

近期必读的五篇计算机视觉顶会CVPR 2020【图神经网络 (GNN) 】相关论文-Part 3

近期必读的五篇计算机视觉顶会CVPR 2020【图神经网络 (GNN) 】相关论文-Part 3

专知会员服务

90+阅读 · 2020年5月19日

近期必读的5篇AI顶会CVPR 2020 GNN (图神经网络) 相关论文

近期必读的5篇AI顶会CVPR 2020 GNN (图神经网络) 相关论文

专知会员服务

79+阅读 · 2020年3月3日

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

专知会员服务

20+阅读 · 2020年1月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

机器之心

1+阅读 · 2022年11月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

Arxiv

0+阅读 · 2023年5月16日

Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering

Arxiv

0+阅读 · 2023年5月12日

Quaternion-valued Correlation Learning for Few-Shot Semantic Segmentation

Arxiv

0+阅读 · 2023年5月12日

Salient Mask-Guided Vision Transformer for Fine-Grained Classification

Arxiv

0+阅读 · 2023年5月11日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

面向无线多媒体传感器网络的高效压缩视频感知

国家自然科学基金

0+阅读 · 2015年12月31日

面向多类图像分类的众包主动学习方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

内蕴特征空间基于低秩和稀疏分析的医学图像集处理方法

国家自然科学基金

1+阅读 · 2013年12月31日

Web图像视觉模式挖掘及其应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于数据融合的大规模无线传感器网络的时空覆盖研究

国家自然科学基金

0+阅读 · 2012年12月31日

NiMnInCo磁热合金的绝热温变研究

国家自然科学基金

0+阅读 · 2012年12月31日

模拟视觉信息处理机制的视频对象行为识别

国家自然科学基金

0+阅读 · 2009年12月31日

基于邻域可视性相关的最优路径问题研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员