Noisy Student Training (NST) has recently demonstrated extremely strong performance in Automatic Speech Recognition (ASR). In this paper, we propose a data selection strategy named LM Filter to improve the performances of NST on non-target domain data in ASR tasks. Hypothesis with and without Language Model are generated and CER differences between them are utilized as a filter threshold. Results reveal that significant improvements of 10.4% compared with no data filtering baselines. We can achieve 3.31% CER in AISHELL-1 test set, which is best result from our knowledge without any other supervised data. We also perform evaluations on supervised 1000 hour AISHELL-2 dataset and competitive results of 4.72% CER can be achieved.
翻译:在本文中,我们提出了一个名为LM过滤器的数据选择战略,以改善非目标域数据在ASR任务中的性能。产生有语言或不使用语言模型的假说,将CER差异用作过滤阈值。结果显示,AISELL-1测试集的CER明显改进10.4%,而没有数据过滤基线。我们可以在AISELL-1测试集中实现3.31%的CER,这是我们的知识所最好的结果,没有任何其他受监督的数据。我们还对监督的1,000小时的ACISELL-2数据集和4.72%的CER的竞争结果进行了评估。