更好的教师，更好的学生：动态的先验知识用于知识蒸馏 (Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation) - 专知论文

会员服务 ·

0

知识 (knowledge) · Better · MoDELS · 蒸馏 · Performer ·

2023 年 3 月 23 日

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

翻译：更好的教师，更好的学生：动态的先验知识用于知识蒸馏

Zengyu Qiu,Xinzhu Ma,Kunlin Yang,Chunya Liu,Jun Hou,Shuai Yi,Wanli Ouyang

from arxiv, ICLR'23 accepted

Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger, existing KD methods fail to achieve better results. Our work shows that the `prior knowledge' is vital to KD, especially when applying large teachers. Particularly, we propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation. This means that our method also takes the teacher's feature as `input', not just `target'. Besides, we dynamically adjust the ratio of the prior knowledge during the training phase according to the feature gap, thus guiding the student in an appropriate difficulty. To evaluate the proposed method, we conduct extensive experiments on two image classification benchmarks (i.e. CIFAR100 and ImageNet) and an object detection benchmark (i.e. MS COCO. The results demonstrate the superiority of our method in performance under varying settings. Besides, our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers. More importantly, DPK provides a fast solution in teacher model selection for any given model.

翻译：知识蒸馏（KD）已经显示出非常有前途的能力，可以将学习表示从大的模型（教师）转移到小的模型（学生）。然而，随着学生和教师之间的容量差距越来越大，现有的KD方法无法取得更好的结果。我们的工作表明，“先验知识”对KD至关重要，特别是当应用大教师时。特别是，我们提出了动态先验知识（DPK），它将一部分教师特征集成为先验知识，以便在特征蒸馏之前使用。这意味着我们的方法不仅将教师的特征作为“目标”，而且还将其作为“输入”。此外，我们根据特征差距动态调整先验知识的比例，从而以适当的难度引导学生。为了评估所提出的方法，我们在两个图像分类基准（即CIFAR100和ImageNet）和一个对象检测基准（即MS COCO）上进行了广泛的实验。结果表明，在各种设置下我们的方法都具有优越性。此外，我们的DPK使学生模型的性能与教师模型的性能呈正相关，这意味着我们可以通过应用更大的教师来进一步提高学生的准确性。更重要的是，DPK为任何给定模型的教师模型选择提供了快速解决方案。

0

相关内容

知识 (knowledge)

知识 (knowledge)

通过学习、实践或探索所获得的认识、判断或技能。

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

专知会员服务

37+阅读 · 2022年11月1日

【CVPR2022】基于知识蒸馏的高效预训练

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

【CVPR2022】基于渐进自蒸馏的鲁棒跨模态表示学习

【CVPR2022】基于渐进自蒸馏的鲁棒跨模态表示学习

专知会员服务

20+阅读 · 2022年4月13日

「知识蒸馏」最新2022研究综述

「知识蒸馏」最新2022研究综述

专知会员服务

123+阅读 · 2022年3月20日

【WWW2022】再思考图卷积网络的知识图谱补全

【WWW2022】再思考图卷积网络的知识图谱补全

专知会员服务

34+阅读 · 2022年2月15日

WSDM2022 | DualDE：基于知识图谱蒸馏的低成本推理

WSDM2022 | DualDE：基于知识图谱蒸馏的低成本推理

专知会员服务

19+阅读 · 2022年1月20日

【KDD2021】拓扑蒸馏推荐系统

专知会员服务

28+阅读 · 2021年6月18日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

专知

1+阅读 · 2022年11月1日

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

PaperWeekly

1+阅读 · 2022年10月5日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

基于相关失效理论的采煤机整机动态最优可靠性研究

国家自然科学基金

1+阅读 · 2016年12月31日

无线地下传感器网络电磁波在耕作层土壤的传输机理及模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

有机材料中光电转换过程的动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

无线自组织网络中动态业务流的机会网络编码优化

国家自然科学基金

0+阅读 · 2013年12月31日

基于收益共享契约的网购供应链库存与运输动态联合优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于郭守敬望远镜恒星光谱库研究大样本晚型恒星的色球活动

国家自然科学基金

0+阅读 · 2012年12月31日

焊接结构本征疲劳裂纹扩展速率研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

蚯蚓菌根互作对玉米根系吸收氮磷的互补效应及作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

多层次无线传感器网络结构及动态密钥管理研究

国家自然科学基金

0+阅读 · 2011年12月31日

An Optimal and Scalable Matrix Mechanism for Noisy Marginals under Convex Loss Functions

Arxiv

0+阅读 · 2023年5月14日

Improving Defensive Distillation using Teacher Assistant

Arxiv

0+阅读 · 2023年5月14日

Estimating and Maximizing Mutual Information for Knowledge Distillation

Arxiv

0+阅读 · 2023年5月11日

Query-Driven Knowledge Base Completion using Multimodal Path Fusion over Multimodal Knowledge Graph

Arxiv

0+阅读 · 2023年5月10日

Synthetic data generation method for data-free knowledge distillation in regression neural networks

Arxiv

0+阅读 · 2023年5月10日

Multi-Teacher Knowledge Distillation For Text Image Machine Translation

Arxiv

0+阅读 · 2023年5月10日

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Arxiv

0+阅读 · 2023年5月9日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

专知会员服务

37+阅读 · 2022年11月1日

【CVPR2022】基于知识蒸馏的高效预训练

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

【CVPR2022】基于渐进自蒸馏的鲁棒跨模态表示学习

【CVPR2022】基于渐进自蒸馏的鲁棒跨模态表示学习

专知会员服务

20+阅读 · 2022年4月13日

「知识蒸馏」最新2022研究综述

「知识蒸馏」最新2022研究综述

专知会员服务

123+阅读 · 2022年3月20日

【WWW2022】再思考图卷积网络的知识图谱补全

【WWW2022】再思考图卷积网络的知识图谱补全

专知会员服务

34+阅读 · 2022年2月15日

WSDM2022 | DualDE：基于知识图谱蒸馏的低成本推理

WSDM2022 | DualDE：基于知识图谱蒸馏的低成本推理

专知会员服务

19+阅读 · 2022年1月20日

【KDD2021】拓扑蒸馏推荐系统

专知会员服务

28+阅读 · 2021年6月18日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

好的知识蒸馏架构是什么样的？蒙特利尔麦吉尔大学最新《知识学习的师生架构》综述论文，12页pdf详述知识蒸馏师生体系结构体系

专知

1+阅读 · 2022年11月1日

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

COLING 2022 | Pro-KD：循序渐进的平滑知识蒸馏

PaperWeekly

1+阅读 · 2022年10月5日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

An Optimal and Scalable Matrix Mechanism for Noisy Marginals under Convex Loss Functions

Arxiv

0+阅读 · 2023年5月14日

Improving Defensive Distillation using Teacher Assistant

Arxiv

0+阅读 · 2023年5月14日

Estimating and Maximizing Mutual Information for Knowledge Distillation

Arxiv

0+阅读 · 2023年5月11日

Query-Driven Knowledge Base Completion using Multimodal Path Fusion over Multimodal Knowledge Graph

Arxiv

0+阅读 · 2023年5月10日

Synthetic data generation method for data-free knowledge distillation in regression neural networks

Arxiv

0+阅读 · 2023年5月10日

Multi-Teacher Knowledge Distillation For Text Image Machine Translation

Arxiv

0+阅读 · 2023年5月10日

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Arxiv

0+阅读 · 2023年5月9日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

相关基金

基于相关失效理论的采煤机整机动态最优可靠性研究

国家自然科学基金

1+阅读 · 2016年12月31日

无线地下传感器网络电磁波在耕作层土壤的传输机理及模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

有机材料中光电转换过程的动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

无线自组织网络中动态业务流的机会网络编码优化

国家自然科学基金

0+阅读 · 2013年12月31日

基于收益共享契约的网购供应链库存与运输动态联合优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于郭守敬望远镜恒星光谱库研究大样本晚型恒星的色球活动

国家自然科学基金

0+阅读 · 2012年12月31日

焊接结构本征疲劳裂纹扩展速率研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

蚯蚓菌根互作对玉米根系吸收氮磷的互补效应及作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

多层次无线传感器网络结构及动态密钥管理研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员