Code generation under long contexts is becoming increasingly critical as Large Language Models (LLMs) are required to reason over extensive information in the codebase. While recent advances enable code LLMs to process long inputs, high API costs and generation latency remain substantial bottlenecks. Existing context pruning techniques, such as LLMLingua, achieve promising results for general text but overlook code-specific structures and dependencies, leading to suboptimal performance in programming tasks. In this paper, we propose LongCodeZip, a novel plug-and-play code compression framework designed specifically for code LLMs. LongCodeZip employs a dual-stage strategy: (1) coarse-grained compression, which identifies and ranks function-level chunks using conditional perplexity with respect to the instruction, retaining only the most relevant functions; and (2) fine-grained compression, which segments retained functions into blocks based on perplexity and selects an optimal subset under an adaptive token budget to maximize relevance. Evaluations across multiple tasks, including code completion, summarization, and question answering, show that LongCodeZip consistently outperforms baseline methods, achieving up to a 5.6x compression ratio without degrading task performance. By effectively reducing context size while preserving essential information, LongCodeZip enables LLMs to better scale to real-world, large-scale code scenarios, advancing the efficiency and capability of code intelligence applications.
翻译:随着大语言模型(LLMs)需要在代码库的广泛信息中进行推理,长上下文下的代码生成正变得日益关键。尽管近期进展使得代码LLMs能够处理长输入,但高昂的API成本和生成延迟仍是重大瓶颈。现有的上下文剪枝技术(如LLMLingua)在通用文本上取得了良好效果,但忽略了代码特有的结构与依赖关系,导致在编程任务中性能欠佳。本文提出LongCodeZip,一种专为代码LLMs设计的新型即插即用代码压缩框架。LongCodeZip采用双阶段策略:(1)粗粒度压缩:基于条件困惑度识别并排序函数级代码块,仅保留与指令最相关的函数;(2)细粒度压缩:将保留的函数按困惑度分割为代码块,并在自适应令牌预算下选择最优子集以最大化相关性。在代码补全、摘要生成和问答等多任务评估中,LongCodeZip始终优于基线方法,在保持任务性能不变的前提下实现了最高5.6倍的压缩比。通过有效缩减上下文规模同时保留关键信息,LongCodeZip使LLMs能更好地扩展到现实世界的大规模代码场景,提升了代码智能应用的效率与能力。