Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist temporal classification (CTC)-based model has attracted research interest due to its non-autoregressive nature. However, such CTC models require a heavy computational cost to achieve outstanding performance. To mitigate the computational burden, we propose a simple yet effective knowledge distillation (KD) for the CTC framework, namely Inter-KD, that additionally transfers the teacher's knowledge to the intermediate CTC layers of the student network. From the experimental results on the LibriSpeech, we verify that the Inter-KD shows better achievements compared to the conventional KD methods. Without using any language model (LM) and data augmentation, Inter-KD improves the word error rate (WER) performance from 8.85 % to 6.30 % on the test-clean.
翻译:最近,深层次学习的进步使端到端语音识别领域有了相当大的改进,简化了传统管道,同时产生了有希望的成果。在端到端模式中,基于连接时间分类(CTC)模式因其非偏向性而吸引了研究兴趣。然而,这些CTC模式需要沉重的计算成本才能取得杰出的绩效。为了减轻计算负担,我们建议为CTC框架,即Inter-KD, 简单而有效的知识蒸馏(KD ), 进一步将教师的知识转移给学生网络的中间CTC层。从LibriSpeech的实验结果中,我们核实Inter-KD与传统的KD方法相比取得了更好的成就。不使用任何语言模型(LM)和数据增强,Inter-KD在测试清洁时将字错误率从8.85%提高到6.30%。