FRED：金融检索增强的语言模型幻觉检测与编辑 (FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models)

Hallucinations in large language models pose a critical challenge for applications requiring factual reliability, particularly in high-stakes domains such as finance. This work presents an effective approach for detecting and editing factually incorrect content in model-generated responses based on the provided context. Given a user-defined domain-specific error taxonomy, we construct a synthetic dataset by inserting tagged errors into financial question-answering corpora and then fine-tune four language models, Phi-4, Phi-4-mini, Qwen3-4B, and Qwen3-14B, to detect and edit these factual inaccuracies. Our best-performing model, fine-tuned Phi-4, achieves an 8% improvement in binary F1 score and a 30% gain in overall detection performance compared to OpenAI-o3. Notably, our fine-tuned Phi-4-mini model, despite having only 4 billion parameters, maintains competitive performance with just a 2% drop in binary detection and a 0.1% decline in overall detection compared to OpenAI-o3. Our work provides a practical solution for detecting and editing factual inconsistencies in financial text generation while introducing a generalizable framework that can enhance the trustworthiness and alignment of large language models across diverse applications beyond finance. Our code and data are available at https://github.com/pegasi-ai/fine-grained-editting.

翻译：大型语言模型中的幻觉对需要事实可靠性的应用构成了严峻挑战，尤其在金融等高风险领域。本研究提出了一种基于给定上下文检测并编辑模型生成响应中事实错误内容的有效方法。给定用户定义的领域特定错误分类体系，我们通过将标记错误插入金融问答语料库来构建合成数据集，随后微调四个语言模型——Phi-4、Phi-4-mini、Qwen3-4B 和 Qwen3-14B——以检测并编辑这些事实性错误。我们性能最佳的微调 Phi-4 模型，在二元 F1 分数上相比 OpenAI-o3 提升了 8%，整体检测性能提升了 30%。值得注意的是，我们微调的 Phi-4-mini 模型尽管仅有 40 亿参数，其二元检测性能仅比 OpenAI-o3 下降 2%，整体检测性能仅下降 0.1%，仍保持了有竞争力的表现。我们的工作为金融文本生成中的事实不一致性检测与编辑提供了实用解决方案，同时引入了一个可泛化的框架，能够提升大型语言模型在金融以外多样化应用中的可信度与对齐性。我们的代码与数据公开于 https://github.com/pegasi-ai/fine-grained-editting。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日