NVT: 愿景变压压缩和参数再分配 (NViT: Vision Transformer Compression and Parameter Redistribution) - 专知论文

会员服务 ·

0

模型评估 · 剪枝 · Vision · 正则化项 · 变换 ·

2021 年 10 月 10 日

NViT: Vision Transformer Compression and Parameter Redistribution

翻译：NVT: 愿景变压压缩和参数再分配

Huanrui Yang,Hongxu Yin,Pavlo Molchanov,Hai Li,Jan Kautz

Transformers yield state-of-the-art results across many tasks. However, they still impose huge computational costs during inference. We apply global, structural pruning with latency-aware regularization on all parameters of the Vision Transformer (ViT) model for latency reduction. Furthermore, we analyze the pruned architectures and find interesting regularities in the final weight structure. Our discovered insights lead to a new architecture called NViT (Novel ViT), with a redistribution of where parameters are used. This architecture utilizes parameters more efficiently and enables control of the latency-accuracy trade-off. On ImageNet-1K, we prune the DEIT-Base (Touvron et al., 2021) model to a 2.6x FLOPs reduction, 5.1x parameter reduction, and 1.9x run-time speedup with only 0.07% loss in accuracy. We achieve more than 1% accuracy gain when compressing the base model to the throughput of the Small/Tiny variants. NViT gains 0.1-1.1% accuracy over the hand-designed DEIT family when trained from scratch, while being faster.

翻译：变异器在很多任务中产生最先进的结果。但是, 在推断过程中, 它们仍然会带来巨大的计算成本。我们应用全球结构, 结构运行, 以 latency- aware 模式的所有参数进行 latency- adorization 模型( latency- aware) 常规化, 以降低 latency 模式的所有参数。此外, 我们分析修整过的架构, 并在最终重量结构中发现有趣的规律性。我们发现的洞察结果导致一个新的架构, 名为 NViT ( Novel Vit), 并在使用参数的地方进行再分配。这个架构将参数更有效地使用, 并能够控制 latency- acure 交易。在图像Net-1K 上, 我们将 DEIT ( Touvron et al., 2021) 模型用于用于 DEIT- base ( Touvron, we prime the base) 2.6x FLOPs 减少、 5.1x 参数减少参数, 和1. 9x- 时间的加速, rienttime- time passuplester in tradestrate from catch and ben craster by cashn raft by by raft and be raft.

0

相关内容

模型评估

机器学习系统设计系统评估标准

图像分类的深度卷积神经网络模型综述

图像分类的深度卷积神经网络模型综述

专知会员服务

57+阅读 · 2021年10月29日

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Mila】通用表示Transformer少样本图像分类

【Mila】通用表示Transformer少样本图像分类

专知会员服务

33+阅读 · 2020年9月7日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Transformer中的相对位置编码

Transformer中的相对位置编码

AINLP

5+阅读 · 2020年11月28日

已删除

将门创投

4+阅读 · 2019年11月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Improving language models by retrieving from trillions of tokens

Arxiv

0+阅读 · 2021年12月8日

OODformer: Out-Of-Distribution Detection Transformer

Arxiv

0+阅读 · 2021年12月6日

Make A Long Image Short: Adaptive Token Length for Vision Transformers

Arxiv

0+阅读 · 2021年12月6日

Intrinisic Gradient Compression for Federated Learning

Arxiv

0+阅读 · 2021年12月5日

Transformer in Transformer

Arxiv

11+阅读 · 2021年10月26日

Parameter Prediction for Unseen Deep Architectures

Arxiv

6+阅读 · 2021年10月25日

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

Arxiv

8+阅读 · 2021年5月30日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

VIP会员

文章信息

相关主题

相关VIP内容

图像分类的深度卷积神经网络模型综述

图像分类的深度卷积神经网络模型综述

专知会员服务

57+阅读 · 2021年10月29日

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知会员服务

151+阅读 · 2021年10月25日

【ICML2021】蛋白质语言模型-MSA Transformer

专知会员服务

34+阅读 · 2021年8月16日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Mila】通用表示Transformer少样本图像分类

【Mila】通用表示Transformer少样本图像分类

专知会员服务

33+阅读 · 2020年9月7日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

Transformer中的相对位置编码

Transformer中的相对位置编码

AINLP

5+阅读 · 2020年11月28日

已删除

将门创投

4+阅读 · 2019年11月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Improving language models by retrieving from trillions of tokens

Arxiv

0+阅读 · 2021年12月8日

OODformer: Out-Of-Distribution Detection Transformer

Arxiv

0+阅读 · 2021年12月6日

Make A Long Image Short: Adaptive Token Length for Vision Transformers

Arxiv

0+阅读 · 2021年12月6日

Intrinisic Gradient Compression for Federated Learning

Arxiv

0+阅读 · 2021年12月5日

Transformer in Transformer

Arxiv

11+阅读 · 2021年10月26日

Parameter Prediction for Unseen Deep Architectures

Arxiv

6+阅读 · 2021年10月25日

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

Arxiv

8+阅读 · 2021年5月30日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

微信扫码咨询专知VIP会员