Recently, Automatic Speech Recognition (ASR), a system that converts audio into text, has caught a lot of attention in the machine learning community. Thus, a lot of publicly available models were released in HuggingFace. However, most of these ASR models are available in English; only a minority of the models are available in Thai. Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model. We hope that our models will be beneficial to individuals and the ASR community in Thailand.
翻译:最近,将音频转换成文字的系统自动语音识别系统(ASR)在机器学习界引起了极大关注,因此,在Hugging Face发布了许多公开的模型,但大多数ASR模型都以英语提供;只有极少数模型以泰文提供;此外,大多数泰国ASR模型都是封闭来源的,现有开放来源模型的性能缺乏活力;为解决这一问题,我们用泰国通用Viicecamps V8培训了一个新的ASR模型,与泰国通用Vicecampulation V8培训了三种语言模型,以提高我们ASR模型的性能。 我们希望我们的模型将有益于泰国的个人和ASR社区。