Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular application domains. Some strategies have been explored to reduce errors in closed ASRs through post-processing, particularly automatic spell checking, and deep learning approaches. In this article, we explore using a deep neural network to refine the results of a phonetic correction algorithm applied to a telesales audio database. The results exhibit a reduction in the word error rate (WER), both in the original transcription and in the phonetic correction, which shows the viability of deep learning models together with post-processing correction strategies to reduce errors made by closed ASRs in specific language domains.
翻译:自动语音识别(ASR)是多个环境中的一个相关领域,因为它提供了应用程序和用户之间的自然通信机制。在使用特定应用程序领域特定语言的环境中,ASR常常失败。一些战略已经探索,通过后处理,特别是自动拼写检查和深层学习方法,减少封闭的ASR中的错误。在本篇文章中,我们探索利用一个深神经网络来完善用于远程销售音频数据库的语音校正算法的结果。结果显示,在原始抄录和语音校正中,字差率都有所下降,这表明深学习模式与后处理纠正战略一起的可行性,以减少封闭的ASR在特定语言领域的错误。