We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). This new metric is for languages that contain half characters and where the same character can be written in different forms. We implement our methodology in Hindi which is one of the main languages from Indic context and we think this approach is scalable to other similar languages containing a large character set. We call our metrics Alternate Word Error Rate (AWER) and Alternate Character Error Rate (ACER). We train our ASR models using wav2vec 2.0\cite{baevski2020wav2vec} for Indic languages. Additionally we use language models to improve our model performance. Our results show a significant improvement in analyzing the error rates at word and character level and the interpretability of the ASR system is improved upto $3$\% in AWER and $7$\% in ACER for Hindi. Our experiments suggest that in languages which have complex pronunciation, there are multiple ways of writing words without changing their meaning. In such cases AWER and ACER will be more useful rather than WER and CER as metrics. Further, we open source a new benchmarking dataset of 21 hours for Hindi with the new metric scripts.
翻译:我们提出了一种计算自动语音识别误差率的新方法。这个新的衡量标准针对的是含有半个字符且可以以不同形式写出相同字符的语言。我们用印地语实施我们的方法,印地语是印地语中的主要语言之一,我们认为这个方法可以推广到包含大字符组的其他类似语言。我们称为“替代单词错误率”和替代字符错误率(ACER)的衡量方法。我们用 wav2vec 2.0\cite{baevski20wev2vec}为印地语语言培训了我们的ASR模型。此外,我们使用语言模型改进了我们的模型性能。我们的结果显示,在分析文字和字符级的误差率方面有了重大改进,而且ASR系统的可解释性能在AWER上提高到3美元,印地语的ACER上提高到7美元。我们的实验表明,在具有复杂读音率的语文中,有多种写词的方式,但不会改变其含义。在这种情况下,AWER和ACER将比WER和CER更有用,而不是作为衡量标准。此外,我们打开了21个新版本的版本的版本版本版本,用于新版本。