QUITO-X：基于信息瓶颈理论的情境压缩新视角 (QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory)

Generative LLM have achieved remarkable success in various industrial applications, owing to their promising In-Context Learning capabilities. However, the issue of long context in complex tasks poses a significant barrier to their wider adoption, manifested in two main aspects: (i) The excessively long context leads to high costs and inference delays. (ii) A substantial amount of task-irrelevant information introduced by long contexts exacerbates the "lost in the middle" problem. Existing methods compress context by removing redundant tokens using metrics such as self-information or PPL, which is inconsistent with the objective of retaining the most important tokens when conditioning on a given query. In this study, we introduce information bottleneck theory (IB) to model the problem, offering a novel perspective that thoroughly addresses the essential properties required for context compression. Additionally, we propose a cross-attention-based approach to approximate mutual information in IB, which can be flexibly replaced with suitable alternatives in different scenarios. Extensive experiments on four datasets demonstrate that our method achieves a 25% increase in compression rate compared to the state-of-the-art, while maintaining question answering performance. In particular, the context compressed by our method even outperform the full context in some cases.

翻译：生成式大语言模型凭借其出色的情境学习能力，已在众多工业应用中取得显著成功。然而，复杂任务中的长情境问题对其更广泛的应用构成了重大障碍，主要体现在两个方面：（i）过长的情境导致高昂的计算成本和推理延迟；（ii）长情境引入的大量任务无关信息加剧了“迷失于中段”问题。现有方法通常通过自信息或PPL等指标去除冗余标记来实现情境压缩，这种做法与在给定查询条件下保留最重要标记的目标存在偏差。本研究引入信息瓶颈理论对该问题进行建模，提供了一个全新的视角，系统性地满足了情境压缩所需的核心特性。此外，我们提出了一种基于交叉注意力的方法来近似信息瓶颈中的互信息，该方法可根据不同场景灵活替换为合适的替代方案。在四个数据集上的大量实验表明，我们的方法在保持问答性能的同时，压缩率比现有最优方法提升了25%。值得注意的是，在某些情况下，经我们方法压缩后的情境甚至优于完整情境的表现。