Recognition errors are common in human communication. Similar errors often lead to unwanted behaviour in dialogue systems or virtual assistants. In human communication, we can recover from them by repeating misrecognized words or phrases; however in human-machine communication this recovery mechanism is not available. In this paper, we attempt to bridge this gap and present a system that allows a user to correct speech recognition errors in a virtual assistant by repeating misunderstood words. When a user repeats part of the phrase the system rewrites the original query to incorporate the correction. This rewrite allows the virtual assistant to understand the original query successfully. We present an end-to-end 2-step attention pointer network that can generate the the rewritten query by merging together the incorrectly understood utterance with the correction follow-up. We evaluate the model on data collected for this task and compare the proposed model to a rule-based baseline and a standard pointer network. We show that rewriting the original query is an effective way to handle repetition-based recovery and that the proposed model outperforms the rule based baseline, reducing Word Error Rate by 19% relative at 2% False Alarm Rate on annotated data.
翻译:识别错误在人类交流中司空见惯。类似的错误往往导致对话系统或虚拟助理中出现不必要的行为。在人类交流中,我们可以通过重复错误识别的词句或短语从中恢复过来;然而,在人力机器交流中,则没有这种回收机制。在本文件中,我们试图弥合这一差距,并提供一个系统,使用户能够通过重复错误的词句来纠正虚拟助理中的语音识别错误。当用户重复了最初查询的部分内容以纳入校正时,系统可以重写原查询部分内容。这种重写使虚拟助理能够成功地理解原始查询。我们提出了一个端到端的两步注意点网络,通过将错误理解的词句与校正后续合并,可以生成重写查询。我们评估了为这项工作收集的数据模式,并将拟议模型与基于规则的基线和标准指针网络进行了比较。我们表明,重写原始查询是处理重复恢复的有效方法,而且拟议的模型比基于规则的基线成功,将Word错误率减少19%,比附加数据2%的错误率为2%。