Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-specific acoustic models are known to perform well in general, they are not easy to maintain when dialect-specific data is scarce and the number of dialects for each language is large. Therefore, a single unified acoustic model (AM) that generalizes well for many dialects has been in demand. In this paper, we propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM. Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously. We also propose a simple but effective training method to deal with unseen dialects. The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.
翻译:尽管在语音识别方面的深层学习取得了成功,但多方言的语音识别仍然是一个难题。尽管已知方言专用声学模型一般表现良好,但当方言特定数据稀少,每种语言的方言数量巨大时,这些声学模型并不容易维持。因此,需要有一个单一的统一声学模型(AM),该模型对许多方言具有很好的概括性。在本文中,我们建议一种新型的声学模型技术,用一个方言单方言识别准确的多方言识别。我们提议的AM基于方言信息及其内部代表进行动态调整,从而形成一个高度适应性AM,同时处理多种方言。我们还提出了一种简单而有效的培训方法,处理看不见方言。大规模语音数据集的实验结果显示,拟议的AM比以往所有方言语均好,将单词错误率(WERs)比单方言方言调AM降低8.11%,比方言语特定AM降低7.31%。