Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this goal, there should be available data sets representing language varieties, and also an understanding of model configuration that is the most helpful in achieving robust understanding of all types of speech. However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition. In this paper, we discuss recent progress towards developing more inclusive ASR systems, namely, the importance of building new data sets representing linguistic diversity, and exploring novel training approaches to improve performance for all users. We address recent directions within benchmarking ASR systems for accented speech, measure the effects of wav2vec 2.0 pre-training on accented speech recognition, and highlight corpora relevant for diverse ASR evaluations.
翻译:建立包容性的语音识别系统是发展所有语言种类的演讲者都能使用的技术的关键步骤。因此,ASR系统必须独立于其语言方式,为每个人服务。为实现这一目标,应当提供代表语言种类的数据集,并理解模型配置,这对于深入理解所有类型的语言最为有用。然而,没有足够的成套数据用于重音,对于已经具备的数据而言,需要探索更多的培训方法,以提高重音识别的质量。在本文件中,我们讨论了最近在开发更具包容性的ASR系统方面取得的进展,即建立反映语言多样性的新数据集的重要性,并探索新的培训方法以改善所有用户的绩效。我们在为重音设定ASR系统基准的最近方向,衡量 wav2vec 2.0 预先培训对重音识别的影响,并突出与多样化的ASR评价相关的公司。