Retrieval-Augmented Generation (RAG) effectively improves the accuracy of Large Language Models (LLMs). However, retrieval noises significantly impact the quality of LLMs' generation, necessitating the development of denoising mechanisms. Previous methods extract evidence straightforwardly without explicit thinking, which risks filtering out key clues and struggles with generalization. To this end, we propose LEAR, which learns to extract rational evidence by (1) explicitly reasoning to identify potential cues within retrieval contents first, and then (2) consciously extracting to avoid omitting any key cues helpful for answering questions. Specifically, we frame evidence reasoning and evidence extraction into one unified response for end-to-end training; apply knowledge token masks for disentanglement to derive reasoning-based and extraction-based answers; and devise three types of verifiable reward functions, including answer, length, and format, to update the model via the policy optimization algorithm. Extensive experiments on three benchmark datasets show the effectiveness of LEAR, providing compact and high-quality evidence, improving the accuracy of downstream tasks, and promoting effective application in online RAG systems.
翻译:检索增强生成(RAG)有效提升了大型语言模型(LLM)的准确性。然而,检索噪声显著影响LLM生成的质量,因此需要开发去噪机制。先前的方法直接提取证据而未进行显式思考,这可能导致过滤掉关键线索且泛化能力不足。为此,我们提出LEAR方法,通过学习来提取理性证据,具体包括:(1)首先通过显式推理识别检索内容中的潜在线索,然后(2)有意识地提取以避免遗漏任何有助于回答问题的关键线索。具体而言,我们将证据推理和证据提取整合为一个统一的响应以进行端到端训练;应用知识令牌掩码进行解耦,以得出基于推理和基于提取的答案;并设计了三种可验证的奖励函数(包括答案、长度和格式),通过策略优化算法更新模型。在三个基准数据集上的大量实验证明了LEAR的有效性,它能够提供紧凑且高质量的证据,提升下游任务的准确性,并促进在线RAG系统中的有效应用。