Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks. Distilled versions of SSL models have been developed to match the needs of on-device speech applications. Though having similar performance as original SSL models, distilled counterparts suffer from performance degradation even more than their original versions in distorted environments. This paper proposes to apply Cross-Distortion Mapping and Domain Adversarial Training to SSL models during knowledge distillation to alleviate the performance gap caused by the domain mismatch problem. Results show consistent performance improvements under both in- and out-of-domain distorted setups for different downstream tasks while keeping efficient model size.
翻译:自监督的预培训语言模型在各种语言处理任务中表现良好。 已经开发了精练版本的 SSL 模型,以满足在设计语言应用程序的需要。 虽然与原始的SSL 模型类似,但在扭曲的环境中,蒸馏的对应方的性能退化甚至比其原有版本还要严重。 本文提议在知识蒸馏过程中对SSL 模型应用交叉扭曲绘图和域对流培训,以缩小域错配问题造成的性能差距。 结果显示,在对不同下游任务进行内部和外部扭曲的设置下游任务下,在保持高效模型规模的同时,在不断改进性能。