It's challenging to customize transducer-based automatic speech recognition (ASR) system with context information which is dynamic and unavailable during model training. In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems. We incorporate the context information into the spelling correction model with a shared context encoder and use a filtering algorithm to handle large-size context lists. Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction, which also significantly outperforms the baseline method such as contextual LM biasing. The model also shows excellent performance for out-of-vocabulary terms not seen during training.
翻译:使用在模型培训期间没有的动态背景信息定制基于传感器的自动语音识别系统( ASR) 具有挑战性。 在这项工作中, 我们引入了轻量级背景拼写修正模型, 以纠正基于传感器的ASR系统中与背景相关的识别错误。 我们将背景信息与共享背景编码器一起纳入拼写校正模型, 并使用过滤算法处理大型背景列表。 实验显示该模型改进了基线 ASR模型的性能, 减少了约50%的相对字词误差率, 这大大超过了背景 LM 偏差等基线方法。 该模型还显示了培训期间未看到的校外词汇的出色性能 。