更少幻觉，更多验证：基于LLM的三阶段ASR纠错框架 (Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction) - 专知论文

会员服务 ·

0

语音识别 · 识别 · 大语言模型 · 有效性 · 标注 ·

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

翻译：更少幻觉，更多验证：基于LLM的三阶段ASR纠错框架

Yangui Fang,Baixu Chen,Jing Peng,Xu Li,Yu Xi,Chengwei Zhang,Guohui Zhong

from arxiv, This paper has been ACCEPTED for publication in ASRU

Automatic Speech Recognition (ASR) error correction aims to correct recognition errors while preserving accurate text. Although traditional approaches demonstrate moderate effectiveness, LLMs offer a paradigm that eliminates the need for training and labeled data. However, directly using LLMs will encounter hallucinations problem, which may lead to the modification of the correct text. To address this problem, we propose the Reliable LLM Correction Framework (RLLM-CF), which consists of three stages: (1) error pre-detection, (2) chain-of-thought sub-tasks iterative correction, and (3) reasoning process verification. The advantage of our method is that it does not require additional information or fine-tuning of the model, and ensures the correctness of the LLM correction under multi-pass programming. Experiments on AISHELL-1, AISHELL-2, and Librispeech show that the GPT-4o model enhanced by our framework achieves 21%, 11%, 9%, and 11.4% relative reductions in CER/WER.

翻译：自动语音识别（ASR）纠错旨在修正识别错误的同时保留准确文本。尽管传统方法展现出一定的有效性，但大语言模型（LLM）提供了一种无需训练和标注数据的范式。然而，直接使用LLM会遭遇幻觉问题，可能导致对正确文本的误改。为解决此问题，我们提出了可靠LLM纠错框架（RLLM-CF），该框架包含三个阶段：（1）错误预检测，（2）思维链子任务迭代纠错，以及（3）推理过程验证。本方法的优势在于无需额外信息或模型微调，并通过多轮编程确保了LLM纠错的正确性。在AISHELL-1、AISHELL-2和Librispeech数据集上的实验表明，经本框架增强的GPT-4o模型在CER/WER上分别实现了21%、11%、9%和11.4%的相对降低。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

ICLR'21 | GNN联邦学习的新基准

ICLR'21 | GNN联邦学习的新基准

图与推荐

12+阅读 · 2021年11月15日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

Facebook开源MUSE：多语言无监督和监督词向量库

Facebook开源MUSE：多语言无监督和监督词向量库

论智

20+阅读 · 2017年12月23日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs

Arxiv

0+阅读 · 12月23日

QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models

Arxiv

0+阅读 · 12月22日

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Arxiv

0+阅读 · 12月21日

Keypoint Counting Classifiers: Turning Vision Transformers into Self-Explainable Models Without Training

Arxiv

0+阅读 · 12月19日

SGCR: A Specification-Grounded Framework for Trustworthy LLM Code Review

Arxiv

0+阅读 · 12月19日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

热门VIP内容

开通专知VIP会员享更多权益服务

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

人工智能与未来指挥

【伯克利博士论文】将大语言模型绑定至虚拟人格：实现人类行为模拟

稀疏自编码器综述：解释大语言模型的内部机制

相关资讯

ICLR'21 | GNN联邦学习的新基准

ICLR'21 | GNN联邦学习的新基准

图与推荐

12+阅读 · 2021年11月15日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

Facebook开源MUSE：多语言无监督和监督词向量库

Facebook开源MUSE：多语言无监督和监督词向量库

论智

20+阅读 · 2017年12月23日

相关论文

Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs

Arxiv

0+阅读 · 12月23日

QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models

Arxiv

0+阅读 · 12月22日

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Arxiv

0+阅读 · 12月21日

Keypoint Counting Classifiers: Turning Vision Transformers into Self-Explainable Models Without Training

Arxiv

0+阅读 · 12月19日

SGCR: A Specification-Grounded Framework for Trustworthy LLM Code Review

Arxiv

0+阅读 · 12月19日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员