Large Language Models (LLMs) often produce fluent but factually incorrect responses, a phenomenon known as hallucination. Abstention, where the model chooses not to answer and instead outputs phrases such as "I don't know", is a common safeguard. However, existing abstention methods typically rely on post-generation signals, such as generation variations or feedback, which limits their ability to prevent unreliable responses in advance. In this paper, we introduce Aspect-Based Causal Abstention (ABCA), a new framework that enables early abstention by analysing the internal diversity of LLM knowledge through causal inference. This diversity reflects the multifaceted nature of parametric knowledge acquired from various sources, representing diverse aspects such as disciplines, legal contexts, or temporal frames. ABCA estimates causal effects conditioned on these aspects to assess the reliability of knowledge relevant to a given query. Based on these estimates, we enable two types of abstention: Type-1, where aspect effects are inconsistent (knowledge conflict), and Type-2, where aspect effects consistently support abstention (knowledge insufficiency). Experiments on standard benchmarks demonstrate that ABCA improves abstention reliability, achieves state-of-the-art performance, and enhances the interpretability of abstention decisions.
翻译:大语言模型(LLMs)常生成流畅但事实错误的回答,这种现象称为幻觉。弃权机制——即模型选择不回答并输出“我不知道”等短语——是一种常见的安全措施。然而,现有弃权方法通常依赖生成后信号(如生成变体或反馈),这限制了其提前阻止不可靠回答的能力。本文提出基于方面的因果弃权(ABCA)框架,该框架通过因果推理分析LLM内部知识的多样性,实现早期弃权。这种多样性反映了从多源获取的参数化知识的多面性,体现为学科、法律语境或时间框架等不同方面。ABCA通过估计以这些方面为条件的因果效应,评估与给定查询相关知识点的可靠性。基于这些估计,我们实现了两种弃权类型:类型-1(方面效应不一致导致知识冲突)和类型-2(方面效应一致支持弃权表明知识不足)。在标准基准测试上的实验表明,ABCA提升了弃权可靠性,取得了最先进的性能,并增强了弃权决策的可解释性。