As more and more online search queries come from voice, automatic speech recognition becomes a key component to deliver relevant search results. Errors introduced by automatic speech recognition (ASR) lead to irrelevant search results returned to the user, thus causing user dissatisfaction. In this paper, we introduce an approach, Mondegreen, to correct voice queries in text space without depending on audio signals, which may not always be available due to system constraints or privacy or bandwidth (for example, some ASR systems run on-device) considerations. We focus on voice queries transcribed via several proprietary commercial ASR systems. These queries come from users making internet, or online service search queries. We first present an analysis showing how different the language distribution coming from user voice queries is from that in traditional text corpora used to train off-the-shelf ASR systems. We then demonstrate that Mondegreen can achieve significant improvements in increased user interaction by correcting user voice queries in one of the largest search systems in Google. Finally, we see Mondegreen as complementing existing highly-optimized production ASR systems, which may not be frequently retrained and thus lag behind due to vocabulary drifts.
翻译:由于越来越多的在线搜索询问来自声音,自动语音识别成为提供相关搜索结果的一个关键组成部分。自动语音识别带来的错误导致用户返回不相关的搜索结果,从而引起用户不满。在本文中,我们引入了一种方法,即Mondegreen,在不依赖音频信号的情况下更正文本空间的语音查询,因为可能由于系统限制或隐私或带宽(例如,一些ASR系统运行在设备上)的考虑而并不总是能够提供。我们侧重于通过几个专有的商业ASR系统转录的语音查询。这些查询来自互联网用户或在线服务搜索查询。我们首先提出分析,用户语音查询的语文分布与传统文本中用于培训现成的ASR系统不同。然后我们表明,Mondargreen通过纠正谷歌中最大的搜索系统中的用户语音查询,可以在增加用户互动方面取得显著的改进。最后,我们认为Mondargreen是对现有的高度优化的ASR生产系统的补充,这些系统可能不会经常被重新训练,因此会因词汇流而落后。