元知识蒸馏 (Meta Knowledge Distillation) - 专知论文

会员服务 ·

0

蒸馏 · Softmax函数/软最大化函数 · MoDELS · Performer · 网格搜索 ·

2022 年 2 月 16 日

Meta Knowledge Distillation

翻译：元知识蒸馏

Jihao Liu,Boxiao Liu,Hongsheng Li,Yu Liu

from arxiv, preprint

Recent studies pointed out that knowledge distillation (KD) suffers from two degradation problems, the teacher-student gap and the incompatibility with strong data augmentations, making it not applicable to training state-of-the-art models, which are trained with advanced augmentations. However, we observe that a key factor, i.e., the temperatures in the softmax functions for generating probabilities of both the teacher and student models, was mostly overlooked in previous methods. With properly tuned temperatures, such degradation problems of KD can be much mitigated. However, instead of relying on a naive grid search, which shows poor transferability, we propose Meta Knowledge Distillation (MKD) to meta-learn the distillation with learnable meta temperature parameters. The meta parameters are adaptively adjusted during training according to the gradients of the learning objective. We validate that MKD is robust to different dataset scales, different teacher/student architectures, and different types of data augmentation. With MKD, we achieve the best performance with popular ViT architectures among compared methods that use only ImageNet-1K as training data, ranging from tiny to large models. With ViT-L, we achieve 86.5% with 600 epochs of training, 0.6% better than MAE that trains for 1,650 epochs.

翻译：最近的研究指出,知识蒸馏(KD)存在两个退化问题,即师生差距和与强力数据增强不相容的问题,使得它不适用于培训最先进的模型,这些模型经过先进的增强能力培训。然而,我们发现,一个关键因素,即用于产生师生模型概率的软负函数的温度,在以往方法中大都被忽视。由于温度适当调适,KD的这种退化问题可以大大减轻。然而,我们建议Meta知识蒸馏(MKD)不是依靠天真的网格搜索,因为这一搜索显示可转移性差,而是用可学习的元温度参数来培训最先进的模型。在培训期间,根据学习目标的梯度调整了元参数。我们确认MKD对不同的数据集尺度、不同的教师/学生结构以及不同的数据增强类型都非常强大。与MKD相比,我们用普通的 VIT结构取得了最佳的性能,与仅使用图像Net-1K(MK)的模型相比,我们用1,6-K的培训数据从小到大模型,从0.5,我们用1,我们用1,6-MAAA5,我们用1,我们用1,我们用1,从0.5,从小到0.AA,从0.1,从0.1,从0.,从0.,从0.,从0.,从0.,从0.,从0.,从0.,到0.1,从0.,从0.,从0.,从0.,到0.,从0.,到0.,从0.,从0.,到0.,从0.,从0.,从0.,到0.,从0.,从0.,从0.,到0.,到0.,从0.,到0.,从0.,从0.,从0.,到0.,到0.,从0.,从0.,从0.,从0.,从0.,从0.,到0.,到0.,到0.,到0.,到0.,到0.,到0.,到0.,从1,从1,到0.,从1,从1,从1,从1,从0.,从0.,从0.,从0.,从0.,从1,从1,从0.,从0.,从0.,从1,从1,到0.,到0.,到0.,到0.,到0.,到0.,从1,到0.,到0.

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

放射性核素标记的EphrinA1 配体类似物对非小细胞肺癌靶向治疗的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于时序遥感影像异常检测的土地覆盖变化检测方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

流程工业生产与能源调度集成优化研究

国家自然科学基金

4+阅读 · 2013年12月31日

miR34a经Notch1通路调节COXⅡ在COPD中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

实现多重随机耦合电力系统鲁棒最优运行的渐近理论与方法

国家自然科学基金

0+阅读 · 2012年12月31日

面向短文本数据流的信息检索与信息过滤协同学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多视图能耗模型的能源调配与生产调度协同优化

国家自然科学基金

2+阅读 · 2012年12月31日

李斯特菌载体在增强丙型肝炎病毒重组多表位树突细胞疫苗中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

电子商务企业间知识转移中知识产权冲突协调及其对策研究

国家自然科学基金

0+阅读 · 2009年12月31日

K-LITE: Learning Transferable Visual Models with External Knowledge

Arxiv

2+阅读 · 2022年4月20日

CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

Arxiv

0+阅读 · 2022年4月15日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Arxiv

18+阅读 · 2021年6月17日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

已删除

Arxiv

32+阅读 · 2020年3月23日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Few-Shot Knowledge Graph Completion

Arxiv

14+阅读 · 2019年11月26日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

VIP会员

文章信息

相关主题

Softmax函数/软最大化函数

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《驻地训练手册》美陆军最新72页

《量子隧穿认知神经网络在军民车辆识别与情感分析中的应用》最新论文

俄罗斯对乌克兰无人机作战的战略适应性分析

《美国海岸警卫队2028部队设计执行计划摘要》最新32页

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

K-LITE: Learning Transferable Visual Models with External Knowledge

Arxiv

2+阅读 · 2022年4月20日

CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

Arxiv

0+阅读 · 2022年4月15日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Arxiv

18+阅读 · 2021年6月17日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

已删除

Arxiv

32+阅读 · 2020年3月23日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Few-Shot Knowledge Graph Completion

Arxiv

14+阅读 · 2019年11月26日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

相关基金

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

放射性核素标记的EphrinA1 配体类似物对非小细胞肺癌靶向治疗的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于时序遥感影像异常检测的土地覆盖变化检测方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

流程工业生产与能源调度集成优化研究

国家自然科学基金

4+阅读 · 2013年12月31日

miR34a经Notch1通路调节COXⅡ在COPD中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

实现多重随机耦合电力系统鲁棒最优运行的渐近理论与方法

国家自然科学基金

0+阅读 · 2012年12月31日

面向短文本数据流的信息检索与信息过滤协同学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于多视图能耗模型的能源调配与生产调度协同优化

国家自然科学基金

2+阅读 · 2012年12月31日

李斯特菌载体在增强丙型肝炎病毒重组多表位树突细胞疫苗中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

电子商务企业间知识转移中知识产权冲突协调及其对策研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员