Retrieval-Augmented Generation (RAG) effectively enhances Large Language Models (LLMs) by incorporating retrieved external knowledge into the generation process. Reasoning models improve LLM performance in multi-hop QA tasks, which require integrating and reasoning over multiple pieces of evidence across different documents to answer a complex question. However, they often introduce substantial computational costs, including increased token consumption and inference latency. To better understand and mitigate this trade-off, we conduct a comprehensive study of reasoning strategies for reasoning models in RAG multi-hop QA tasks. Our findings reveal that reasoning models adopt structured strategies to integrate retrieved and internal knowledge, primarily following two modes: Context-Grounded Reasoning, which relies directly on retrieved content, and Knowledge-Reconciled Reasoning, which resolves conflicts or gaps using internal knowledge. To this end, we propose a novel Lightweight Rerank Reasoning Strategy Framework for RAG (LiR$^3$AG) to enable non-reasoning models to transfer reasoning strategies by restructuring retrieved evidence into coherent reasoning chains. LiR$^3$AG significantly reduce the average 98% output tokens overhead and 58.6% inferencing time while improving 8B non-reasoning model's F1 performance ranging from 6.2% to 22.5% to surpass the performance of 32B reasoning model in RAG, offering a practical and efficient path forward for RAG systems.
翻译:检索增强生成(RAG)通过将检索到的外部知识整合到生成过程中,有效增强了大型语言模型(LLMs)。推理模型提升了LLMs在多跳问答任务中的性能,这类任务需要整合并推理来自不同文档的多条证据以回答复杂问题。然而,它们通常会带来显著的计算开销,包括增加的令牌消耗和推理延迟。为了更好地理解并缓解这种权衡,我们对RAG多跳问答任务中推理模型的推理策略进行了全面研究。我们的发现表明,推理模型采用结构化策略来整合检索到的知识与内部知识,主要遵循两种模式:基于上下文的推理(直接依赖检索内容)和知识协调推理(利用内部知识解决冲突或填补空白)。为此,我们提出了一种新颖的用于RAG的轻量级重排序推理策略框架(LiR$^3$AG),通过将检索到的证据重组为连贯的推理链,使非推理模型能够迁移推理策略。LiR$^3$AG显著减少了平均98%的输出令牌开销和58.6%的推理时间,同时将8B参数非推理模型的F1性能提升了6.2%至22.5%,使其在RAG中超越了32B参数推理模型的性能,为RAG系统提供了一条实用且高效的前进路径。