通过精细磨面粉蒸馏高效愿景变异器 (Efficient Vision Transformers via Fine-Grained Manifold Distillation) - 专知论文

会员服务 ·

0

蒸馏 · 流形 · 变换 · Vision · Performer ·

2021 年 7 月 6 日

Efficient Vision Transformers via Fine-Grained Manifold Distillation

翻译：通过精细磨面粉蒸馏高效愿景变异器

Ding Jia,Kai Han,Yunhe Wang,Yehui Tang,Jianyuan Guo,Chao Zhang,Dacheng Tao

This paper studies the model compression problem of vision transformers. Benefit from the self-attention module, transformer architectures have shown extraordinary performance on many computer vision tasks. Although the network performance is boosted, transformers are often required more computational resources including memory usage and the inference complexity. Compared with the existing knowledge distillation approaches, we propose to excavate useful information from the teacher transformer through the relationship between images and the divided patches. We then explore an efficient fine-grained manifold distillation approach that simultaneously calculates cross-images, cross-patch, and random-selected manifolds in teacher and student models. Experimental results conducted on several benchmarks demonstrate the superiority of the proposed algorithm for distilling portable transformer models with higher performance. For example, our approach achieves 75.06% Top-1 accuracy on the ImageNet-1k dataset for training a DeiT-Tiny model, which outperforms other ViT distillation methods.

翻译：本文研究视觉变压器的模型压缩问题。从自我注意模块中受益, 变压器结构显示许多计算机的视觉任务有非凡的性能。虽然网络性能得到提升, 但变压器往往需要更多的计算资源, 包括内存使用和推推力的复杂性。与现有的知识蒸馏方法相比, 我们提议通过图像和分割补丁之间的关系, 从教师变压器中挖掘有用的信息。然后我们探索一种高效的精细裁剪精细的蒸馏方法, 既计算跨图像、交叉匹配, 也同时计算教师和学生模型中随机选择的多元。在几个基准上进行的实验结果显示, 以更高性能的方式蒸馏移动式变压器模型的拟议算法的优势。例如, 我们的方法在图像Net-1k数据集中实现了75.06% Top-1 精确度, 用于培训Deit-Tiny 模型, 后者比 VIT 的其他蒸馏方法要强。

0

相关内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【WWW2021】细粒度城市流量预测

专知会员服务

33+阅读 · 2021年4月6日

【WWW2021】双曲图卷积网络的协同过滤

【WWW2021】双曲图卷积网络的协同过滤

专知会员服务

40+阅读 · 2021年3月26日

Facebook@ICLR2021 比GNN快100倍的标签传播

专知会员服务

33+阅读 · 2021年2月21日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【KDD2020】图神经网络在生物医药领域的应用

【KDD2020】图神经网络在生物医药领域的应用

专知会员服务

38+阅读 · 2020年11月2日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】基于视频修复的时空转换网络

【泡泡一分钟】基于视频修复的时空转换网络

泡泡机器人SLAM

5+阅读 · 2018年12月30日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

专知

7+阅读 · 2018年4月13日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Arxiv

0+阅读 · 2021年9月7日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Arxiv

7+阅读 · 2020年3月19日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Question Answering through Transfer Learning from Large Fine-grained Supervision Data

Arxiv

3+阅读 · 2018年5月31日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【WWW2021】细粒度城市流量预测

专知会员服务

33+阅读 · 2021年4月6日

【WWW2021】双曲图卷积网络的协同过滤

【WWW2021】双曲图卷积网络的协同过滤

专知会员服务

40+阅读 · 2021年3月26日

Facebook@ICLR2021 比GNN快100倍的标签传播

专知会员服务

33+阅读 · 2021年2月21日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【KDD2020】图神经网络在生物医药领域的应用

【KDD2020】图神经网络在生物医药领域的应用

专知会员服务

38+阅读 · 2020年11月2日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】基于视频修复的时空转换网络

【泡泡一分钟】基于视频修复的时空转换网络

泡泡机器人SLAM

5+阅读 · 2018年12月30日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

【论文推荐】最新六篇行人再识别相关论文—特定视角、多目标、双注意匹配网络、联合属性-身份、迁移学习、多通道金字塔型

专知

7+阅读 · 2018年4月13日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

相关论文

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Arxiv

0+阅读 · 2021年9月7日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Arxiv

9+阅读 · 2021年3月25日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Arxiv

7+阅读 · 2020年3月19日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Question Answering through Transfer Learning from Large Fine-grained Supervision Data

Arxiv

3+阅读 · 2018年5月31日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

微信扫码咨询专知VIP会员