Oracle 教师:利用目标信息提高四氯化碳模型的知识蒸馏 (Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models)

Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for connectionist temporal classification (CTC)-based sequence models, namely Oracle Teacher, that leverages both the source inputs and the output labels as the teacher model's input. Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance. One potential risk for the proposed approach is a trivial solution that the model's output directly copies the target input. Based on a many-to-one mapping property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution and thus enables utilizing both source and target inputs for model training. Extensive experiments are conducted on two sequence learning tasks: speech recognition and scene text recognition. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.

翻译：知识蒸馏(KD)是最被称为一种有效的模型压缩方法,其目的在于将更大的网络(教师)知识传授给一个更小的网络(学生)。常规的KD方法通常使用以监督方式培训的教师模式,这种模式只将输出标签作为目标对待。进一步推广这一监督的计划,我们引入一种新型的基于连接器时间分类(CTC)序列模型的教师模式,即甲骨文教师,这种模式将源投入和产出标签作为教师模式的投入,用来利用源投入和产出标签。由于甲骨文教师通过参考目标信息学习更精确的CT调整,它可以为学生提供更优化的指导。拟议方法的一个潜在风险是该模式输出直接复制目标投入的微不足道的解决方案。基于对立式计算机算法属性的多比一绘图,我们提出了一项培训战略,可以有效防止微不足道的解决方案,从而利用源和目标投入作为模式培训。在两个序列学习任务上进行了广泛的实验:语音识别和现场文字识别。从实验结果中,我们实验性地展示了拟议模型在完成这些任务时相当的速度改进学生。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日