Automatic Speech Recognition (ASR) error correction aims to correct recognition errors while preserving accurate text. Although traditional approaches demonstrate moderate effectiveness, LLMs offer a paradigm that eliminates the need for training and labeled data. However, directly using LLMs will encounter hallucinations problem, which may lead to the modification of the correct text. To address this problem, we propose the Reliable LLM Correction Framework (RLLM-CF), which consists of three stages: (1) error pre-detection, (2) chain-of-thought sub-tasks iterative correction, and (3) reasoning process verification. The advantage of our method is that it does not require additional information or fine-tuning of the model, and ensures the correctness of the LLM correction under multi-pass programming. Experiments on AISHELL-1, AISHELL-2, and Librispeech show that the GPT-4o model enhanced by our framework achieves 21%, 11%, 9%, and 11.4% relative reductions in CER/WER.
翻译:自动语音识别(ASR)纠错旨在修正识别错误的同时保留准确文本。尽管传统方法展现出一定的有效性,但大语言模型(LLM)提供了一种无需训练和标注数据的范式。然而,直接使用LLM会遭遇幻觉问题,可能导致对正确文本的误改。为解决此问题,我们提出了可靠LLM纠错框架(RLLM-CF),该框架包含三个阶段:(1)错误预检测,(2)思维链子任务迭代纠错,以及(3)推理过程验证。本方法的优势在于无需额外信息或模型微调,并通过多轮编程确保了LLM纠错的正确性。在AISHELL-1、AISHELL-2和Librispeech数据集上的实验表明,经本框架增强的GPT-4o模型在CER/WER上分别实现了21%、11%、9%和11.4%的相对降低。