基于强化学习的检索增强生成中理性证据提取方法研究 (Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) effectively improves the accuracy of Large Language Models (LLMs). However, retrieval noises significantly impact the quality of LLMs' generation, necessitating the development of denoising mechanisms. Previous methods extract evidence straightforwardly without explicit thinking, which risks filtering out key clues and struggles with generalization. To this end, we propose EviOmni, which learns to extract rational evidence by (1) explicitly reasoning to identify potential cues within retrieval contents first, and then (2) consciously extracting to avoid omitting any key cues helpful for answering questions. Specifically, we frame evidence reasoning and evidence extraction into one unified response for end-to-end training; apply knowledge token masks for disentanglement to derive reasoning-based and extraction-based answers; and devise three types of verifiable reward functions, including answer, length, and format, to update the model via the policy optimization algorithm. Extensive experiments on three benchmark datasets show the effectiveness of EviOmni, providing compact and high-quality evidence, improving the accuracy of downstream tasks, and promoting effective application in online RAG systems.

翻译：检索增强生成（RAG）能有效提升大语言模型（LLM）的准确性。然而，检索噪声会显著影响LLM生成内容的质量，因此需要开发去噪机制。现有方法通常直接提取证据而未进行显式推理，这可能导致关键线索被过滤且泛化能力受限。为此，我们提出EviOmni方法，通过学习提取理性证据，其过程包括：（1）首先通过显式推理识别检索内容中的潜在线索；（2）随后进行有意识的提取以避免遗漏对回答问题有帮助的关键线索。具体而言，我们将证据推理与证据提取整合为统一的响应以进行端到端训练；应用知识标记掩码进行解耦，分别得到基于推理和基于提取的答案；并设计了三种可验证的奖励函数（包括答案质量、长度和格式），通过策略优化算法更新模型。在三个基准数据集上的大量实验表明，EviOmni能提供紧凑且高质量的证据，提升下游任务的准确性，并促进在线RAG系统的有效应用。