Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization. Our on-device simulation experiments demonstrate that the proposed approach outperforms the traditional re-scoring technique by 12% relative WER and 15.7% entity mention specific F1-score in a continues personalization scenario.
翻译:快速环境适应性在改进稀有词的自动语音识别(ASR)方面已经证明是有效的,如果与个人化的在线培训相结合,它可以产生更好的识别结果。然而,基于外部语言模式的传统重新定位方法在个人化培训期间容易出现差异。在这项工作中,我们采用了基于模式的端到端背景适应性方法,该方法可以分解对异词的识别,并适合个人化。我们的虚拟模拟实验表明,拟议的方法比传统的再校准技术高出12%的相对WER和15.7%的实体在个人化情景中提及具体的F1点。