Knowledge distillation has widely been used for model compression and domain adaptation for speech applications. In the presence of multiple teachers, knowledge can easily be transferred to the student by averaging the models output. However, previous research shows that the student do not adapt well with such combination. This paper propose to use an elitist sampling strategy at the output of ensemble teacher models to select the best-decoded utterance generated by completely out-of-domain teacher models for generalizing unseen domain. The teacher models are trained on AMI, LibriSpeech and WSJ while the student is adapted for the Switchboard data. The results show that with the selection strategy based on the individual models posteriors the student model achieves a better WER compared to all the teachers and baselines with a minimum absolute improvement of about 8.4 percent. Furthermore, an insights on the model adaptation with out-of-domain data has also been studied via correlation analysis.
翻译:在多种教师在场的情况下,知识很容易通过平均模型输出向学生传授。然而,先前的研究显示,学生与这种组合不适应。本文建议在混合教师模型的输出中采用精英抽样战略,选择完全由外部教师模型生成的最佳解析语,以普及无形域。在学生适应交换板数据时,教师模型就AMI、LibriSpeech和WSJ进行了培训。结果显示,根据单个模型的外表模型选择战略,学生模型与所有教师相比,取得了更好的WER,基线也至少实现了8.4%的绝对改进。此外,还通过相关分析,对模型与外部数据调整的见解进行了研究。</s>