Retrieval-Augmented Generation (RAG) effectively improves the accuracy of Large Language Models (LLMs). However, retrieval noises significantly impact the quality of LLMs' generation, necessitating the development of denoising mechanisms. Previous methods extract evidence straightforwardly without explicit thinking, which risks filtering out key clues and struggles with generalization. To this end, we propose LEAR, which learns to extract rational evidence by (1) explicitly reasoning to identify potential cues within retrieval contents first, and then (2) consciously extracting to avoid omitting any key cues helpful for answering questions. Specifically, we frame evidence reasoning and evidence extraction into one unified response for end-to-end training; apply knowledge token masks for disentanglement to derive reasoning-based and extraction-based answers; and devise three types of verifiable reward functions, including answer, length, and format, to update the model via the policy optimization algorithm. Extensive experiments on three benchmark datasets show the effectiveness of LEAR, providing compact and high-quality evidence, improving the accuracy of downstream tasks, and promoting effective application in online RAG systems.
翻译:检索增强生成(RAG)能有效提升大语言模型(LLM)的准确性。然而,检索噪声会显著影响LLM生成内容的质量,因此需要开发去噪机制。现有方法通常直接提取证据而缺乏显式推理,这可能导致过滤掉关键线索且泛化能力不足。为此,我们提出LEAR方法,通过学习实现理性证据提取,其核心在于:(1)首先通过显式推理识别检索内容中的潜在线索;(2)随后进行有意识的提取以避免遗漏任何有助于回答问题的关键线索。具体而言,我们将证据推理与证据提取整合为统一的端到端训练响应;应用知识标记掩码进行解耦,以得到基于推理和基于提取的答案;并设计了三种可验证的奖励函数(包括答案、长度和格式),通过策略优化算法更新模型。在三个基准数据集上的大量实验证明了LEAR的有效性,该方法能提供紧凑且高质量的证据,提升下游任务的准确性,并推动在线RAG系统的实际应用。