While the Turkish language is listed among low-resource languages, literature on Turkish automatic speech recognition (ASR) is relatively old. In this report, we present our findings on Turkish ASR with speech representation learning using HUBERT. We investigate pre-training HUBERT for Turkish with large-scale data curated from online resources. We pre-train our model using 6,500 hours of speech data from YouTube. The results show that the models are not ready for commercial use since they are not robust against disturbances that typically occur in real-world settings such as variations in accents, slang, background noise and interference. We analyze typical errors and the limitations of the models for use in commercial settings.
翻译:虽然土耳其语被列为低资源语言,但土耳其自动语音识别(ASR)的文献相对陈旧。我们在本报告中介绍了关于土耳其自动语音识别(ASR)的研究结果,并用HUBERT来进行语音代表学习。我们调查了土耳其语的HUBERT预先培训情况,使用在线资源提供的大规模数据。我们用YouTube6500小时的语音数据对模型进行了预先培训。结果显示,这些模型没有做好商业用途,因为它们对于在现实世界环境中通常发生的骚乱,例如口音、声响、背景噪音和干扰等变化没有很强的抗争能力。我们分析了商业环境中使用的典型错误和模式的局限性。我们分析了用于商业环境中的模式的典型错误和局限性。