End-to-end automatic speech recognition systems often fail to transcribe domain-specific named entities, causing catastrophic failures in downstream tasks. Numerous fast and lightweight named entity correction (NEC) models have been proposed in recent years. These models, mainly leveraging phonetic-level edit distance algorithms, have shown impressive performances. However, when the forms of the wrongly-transcribed words(s) and the ground-truth entity are significantly different, these methods often fail to locate the wrongly transcribed words in hypothesis, thus limiting their usage. We propose a novel NEC method that utilizes speech sound features to retrieve candidate entities. With speech sound features and candidate entities, we inovatively design a generative method to annotate entity errors in ASR transcripts and replace the text with correct entities. This method is effective in scenarios of word form difference. We test our method using open-source and self-constructed test sets. The results demonstrate that our NEC method can bring significant improvement to entity accuracy. The self-constructed training data and test set is publicly available at github.com/L6-NLP/Generative-Annotation-NEC.
翻译:端到端自动语音识别系统常无法准确转录领域特定的命名实体,导致下游任务出现灾难性错误。近年来涌现出众多快速轻量的命名实体纠错模型,这些主要基于音素级编辑距离算法的方法已展现出优异性能。然而当错误转录词与真实实体在词形上差异显著时,这些方法往往难以在假设文本中定位错误转录词,从而限制了其应用范围。本文提出一种新型命名实体纠错方法,通过语音声学特征检索候选实体,并创新性地设计生成式标注机制,在ASR转录文本中标注实体错误并用正确实体替换。该方法能有效应对词形差异场景。我们使用开源数据集与自建测试集进行验证,结果表明本方法能显著提升实体识别准确率。自建训练数据与测试集已公开于github.com/L6-NLP/Generative-Annotation-NEC。