While large language models (LLMs) show considerable promise across various fields, they have notable limitations in handling multi-document question answering (Multi-doc QA) tasks. The first challenge is long-range dependency modeling, where LLMs struggle to focus on key information in long texts, which weakens important semantic connections. Second, most LLMs suffer from the ''lost-in-the-middle'' issue, where they have difficulty processing information in the middle of long inputs. Current solutions either truncate global dependencies or demand costly finetuning, ultimately lacking a universal and simple solution for these challenges. To resolve these limitations, we propose Dual-Stage Adaptive Sharpening (DSAS) containing two modules. (i) The Contextual Gate Weighting (CGW) module alleviates ''lost-in-the-middle'' by assessing paragraph relevance through layer-wise attention tracking and position-aware weighting. (ii) The Reciprocal Attention Suppression (RAS) module enhances focus on critical paragraphs by suppressing information exchange between key and irrelevant texts, thus mitigating the limitations in long-range dependency modeling. Notably, DSAS functions as a plug-and-play solution requiring no architectural modifications or extra training parameters. Extensive experiments on four benchmarks demonstrate DSAS's efficacy across mainstream LLMs (Llama, Qwen, Mistral, and Deepseek), with an average F1-score improvement of 4.2% in Multi-doc QA tasks on Llama-3.1-8B-Instruct and Qwen2.5-14B-Instruct. Ablation studies confirm the essential contributions of both the CGW and RAS modules. In addition, detailed discussions in the Appendix further validate the robustness and scalability of DSAS.
翻译:尽管大型语言模型(LLMs)在各个领域展现出巨大潜力,但在处理多文档问答(Multi-doc QA)任务时仍存在显著局限。首要挑战是长距离依赖建模,LLMs难以在长文本中聚焦关键信息,这削弱了重要的语义关联。其次,大多数LLMs受困于"中间信息丢失"问题,难以有效处理长输入中间部分的信息。现有解决方案要么截断全局依赖,要么需要昂贵的微调,最终缺乏应对这些挑战的通用且简洁的方案。为突破这些限制,我们提出包含两个模块的双阶段自适应锐化(DSAS)框架。(i)上下文门控加权(CGW)模块通过逐层注意力追踪与位置感知加权来评估段落相关性,从而缓解"中间信息丢失"问题。(ii)互斥注意力抑制(RAS)模块通过抑制关键文本与无关文本间的信息交互,增强对关键段落的聚焦,从而改善长距离依赖建模的局限。值得注意的是,DSAS作为即插即用方案,无需修改模型架构或增加训练参数。在四个基准测试上的大量实验表明,DSAS在主流LLMs(Llama、Qwen、Mistral和Deepseek)上均表现优异,其中Llama-3.1-8B-Instruct和Qwen2.5-14B-Instruct在Multi-doc QA任务中的平均F1分数提升达4.2%。消融研究证实了CGW与RAS模块的核心贡献。此外,附录中的详细讨论进一步验证了DSAS的鲁棒性与可扩展性。