Oracle 教师:迈向更好的知识蒸馏 (Oracle Teacher: Towards Better Knowledge Distillation) - 专知论文

会员服务 ·

0

Oracle · MoDELS · Better · 蒸馏 · Extensibility ·

2022 年 2 月 1 日

Oracle Teacher: Towards Better Knowledge Distillation

翻译：Oracle 教师:迈向更好的知识蒸馏

Ji Won Yoon,Hyung Yong Kim,Hyeonseung Lee,Sunghwan Ahn,Nam Soo Kim

from arxiv, Under review

Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for KD, namely Oracle Teacher, that utilizes the embeddings of both the source inputs and the output labels to extract a more accurate knowledge to be transferred to the student. One potential risk for the proposed approach is a trivial solution that the model's output directly copies the oracle input. In Oracle Teacher, we present a training scheme that can effectively prevent the trivial solution and thus enables utilizing both source inputs and oracle inputs for model training. Extensive experiments are conducted on three different sequence learning tasks: speech recognition, scene text recognition, and machine translation. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.

翻译：知识蒸馏(KD)最常称为一种有效的模型压缩方法,其目的在于将更大的网络(教师)的知识传授给一个小得多的网络(学生)。常规的KD方法通常使用以监督方式培训的教师模式,这种模式只将输出标签作为目标。进一步推广这一监督的计划,我们为KD引入一种新的教师模式,即甲骨文教师,这种模式利用源投入和输出标签的嵌入来获取更准确的知识,以便传递给学生。拟议方法的一个潜在风险是该模型输出直接复制神器输入的微不足道的解决方案。在Oracle师傅中,我们提出了一个培训计划,能够有效防止微小的解决方案,从而能够同时利用源投入和神器投入进行模型培训。我们根据实验结果对三种不同的学习任务进行了广泛的实验:语音识别、现场文本识别和机器翻译。我们从实验结果中可以看出,拟议的模型改进了学生完成这些任务的进度,同时在教师模型的培训时间上实现相当的加速。

0

相关内容

Oracle

甲骨文公司，全称甲骨文股份有限公司(甲骨文软件系统有限公司)，是全球最大的企业级软件公司，总部位于美国加利福尼亚州的红木滩。1989年正式进入中国市场。2013年，甲骨文已超越 IBM ，成为继 Microsoft 后全球第二大软件公司。

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

37+阅读 · 2020年6月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

嘿，同学！田厂对你很好奇！

嘿，同学！田厂对你很好奇！

微软招聘

0+阅读 · 2021年11月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于大数据挖掘的数控机床多工况载荷谱系研究

国家自然科学基金

0+阅读 · 2016年12月31日

绿色制造企业级能源与生产协调随机优化调度

国家自然科学基金

2+阅读 · 2014年12月31日

维修次数有限情形下复杂可修系统最优维修策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

大规模新能源并网系统概率1稳定的自适应控制策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向不平衡样本的流形学习故障诊断方法

国家自然科学基金

0+阅读 · 2012年12月31日

苹果实生树个体发育过程中氧化还原平衡态变化及调控的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

布尔函数的密码性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

HIF-1α22312;月经过程中对VEGF的调控及其上游调控通路的研究

国家自然科学基金

0+阅读 · 2009年12月31日

电子商务企业间知识转移中知识产权冲突协调及其对策研究

国家自然科学基金

0+阅读 · 2009年12月31日

Distilling and Transferring Knowledge via cGAN-generated Samples for Image Classification and Regression

Arxiv

0+阅读 · 2022年4月19日

A Semi-supervised Learning Approach with Two Teachers to Improve Breakdown Identification in Dialogues

A Semi-supervised Learning Approach with Two Teachers to Improve Breakdown Identification in Dialogues

Arxiv

0+阅读 · 2022年4月19日

Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection

Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

已删除

Arxiv

32+阅读 · 2020年3月23日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

37+阅读 · 2020年6月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

嘿，同学！田厂对你很好奇！

嘿，同学！田厂对你很好奇！

微软招聘

0+阅读 · 2021年11月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Distilling and Transferring Knowledge via cGAN-generated Samples for Image Classification and Regression

Arxiv

0+阅读 · 2022年4月19日

A Semi-supervised Learning Approach with Two Teachers to Improve Breakdown Identification in Dialogues

A Semi-supervised Learning Approach with Two Teachers to Improve Breakdown Identification in Dialogues

Arxiv

0+阅读 · 2022年4月19日

Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection

Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

已删除

Arxiv

32+阅读 · 2020年3月23日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

相关基金

基于大数据挖掘的数控机床多工况载荷谱系研究

国家自然科学基金

0+阅读 · 2016年12月31日

绿色制造企业级能源与生产协调随机优化调度

国家自然科学基金

2+阅读 · 2014年12月31日

维修次数有限情形下复杂可修系统最优维修策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

大规模新能源并网系统概率1稳定的自适应控制策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向不平衡样本的流形学习故障诊断方法

国家自然科学基金

0+阅读 · 2012年12月31日

苹果实生树个体发育过程中氧化还原平衡态变化及调控的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

布尔函数的密码性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

HIF-1α22312;月经过程中对VEGF的调控及其上游调控通路的研究

国家自然科学基金

0+阅读 · 2009年12月31日

电子商务企业间知识转移中知识产权冲突协调及其对策研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员