Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to mitigate hallucinations in large language models (LLMs). However, existing methods rely on model-internal signals (e.g., logits, entropy), which are fundamentally unreliable because LLMs are typically ill-calibrated and often exhibit high confidence in erroneous outputs. We propose QuCo-RAG, which shifts from subjective confidence to objective statistics computed from pre-training data. Our method quantifies uncertainty through two stages: (1) before generation, we identify low-frequency entities indicating long-tail knowledge gaps; (2) during generation, we verify entity co-occurrence in the pre-training corpus, where zero co-occurrence often signals hallucination risk. Both stages leverage Infini-gram for millisecond-latency queries over 4 trillion tokens, triggering retrieval when uncertainty is high. Experiments on multi-hop QA benchmarks show QuCo-RAG achieves EM gains of 5--12 points over state-of-the-art baselines with OLMo-2 models, and transfers effectively to models with undisclosed pre-training data (Llama, Qwen, GPT), improving EM by up to 14 points. Domain generalization on biomedical QA further validates the robustness of our paradigm. These results establish corpus-grounded verification as a principled, practically model-agnostic paradigm for dynamic RAG. Our code is publicly available at https://github.com/ZhishanQ/QuCo-RAG.
翻译:动态检索增强生成通过在生成过程中自适应地决定何时检索,以缓解大语言模型中的幻觉问题。然而,现有方法依赖于模型内部信号(如对数概率、熵),这些信号本质上是不可靠的,因为大语言模型通常校准不佳,且经常对错误输出表现出高置信度。我们提出了QuCo-RAG,该方法将从主观置信度转向基于预训练数据计算的客观统计量。我们的方法通过两个阶段量化不确定性:(1)在生成前,我们识别指示长尾知识缺口的低频实体;(2)在生成过程中,我们验证实体在预训练语料库中的共现情况,零共现通常意味着幻觉风险。两个阶段均利用Infini-gram对超过4万亿词元进行毫秒级延迟查询,当不确定性高时触发检索。在多跳问答基准上的实验表明,QuCo-RAG在使用OLMo-2模型时,相较于最先进的基线方法实现了5-12个点的精确匹配增益,并能有效迁移至预训练数据未公开的模型(如Llama、Qwen、GPT),将精确匹配提升高达14个点。在生物医学问答领域的泛化测试进一步验证了我们范式的鲁棒性。这些结果表明,基于语料库的验证为动态检索增强生成提供了一种原则性的、实际模型无关的范式。我们的代码公开于 https://github.com/ZhishanQ/QuCo-RAG。