Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition (ASR). When using appropriate modeling units, e.g., byte-pair encoded characters, these systems are in principal open vocabulary systems. In practice, however, they often fail to recognize words not seen during training, e.g., named entities, numbers or technical terms. To alleviate this problem we supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. After the training of the ASR system, and when it has already been deployed, a relevant word can be added or subtracted instantly without the need for further training. In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize compared to a strong baseline.
翻译:神经序列到顺序系统提供最先进的自动语音识别性能。当使用适当的模型单位时,例如字面文字编码字符,这些系统位于主要的开放词汇系统中,但在实践中,这些系统往往无法识别培训期间看不到的单词,例如名称实体、数字或技术术语。为了缓解这一问题,我们用一个单词/文字内存和一个机制来补充一个端到端的ASR系统,以正确识别文字和短语。在ASR系统培训之后,如果已经部署,一个相关的单词可以立即添加或减值,而无需进一步培训。在本文中,我们表明,通过这个机制,我们的系统能够识别超过85%的新增加的词,而它以前没有与一个强大的基线相比得到承认。