语境即工具：面向长周期软件工程智能体的语境管理 (Context as a Tool: Context Management for Long-Horizon SWE-Agents)

Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebases. However, most existing agents rely on append-only context maintenance or passively triggered compression heuristics, which often lead to context explosion, semantic drift, and degraded reasoning in long-running interactions. We propose CAT, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents. CAT formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions, and enables agents to proactively compress historical trajectories into actionable summaries at appropriate milestones. To support context management for SWE-agents, we propose a trajectory-level supervision framework, CAT-GENERATOR, based on an offline data construction pipeline that injects context-management actions into complete interaction trajectories. Using this framework, we train a context-aware model, SWE-Compressor. Experiments on SWE-Bench-Verified demonstrate that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines, while maintaining stable and scalable long-horizon reasoning under a bounded context budget.

翻译：基于大型语言模型的智能体最近在需要与仓库级代码库进行长周期交互的实际软件工程任务中展现出强大潜力。然而，现有智能体大多依赖仅追加式语境维护或被动触发的压缩启发式方法，这往往导致长程交互中出现语境爆炸、语义漂移和推理能力退化。我们提出CAT，一种新的语境管理范式，它将语境维护提升为可调用的工具，并集成到智能体的决策过程中。CAT形式化了一个结构化的语境工作空间，包含稳定的任务语义、压缩的长期记忆和高保真的短期交互，使智能体能够在适当的里程碑处主动将历史轨迹压缩为可操作的摘要。为支持软件工程智能体的语境管理，我们提出了轨迹级监督框架CAT-GENERATOR，它基于离线数据构建流程，将语境管理操作注入完整的交互轨迹中。利用该框架，我们训练了一个语境感知模型SWE-Compressor。在SWE-Bench-Verified上的实验表明，SWE-Compressor达到了57.6%的问题解决率，显著优于基于ReAct的智能体和静态压缩基线方法，同时在有限语境预算下保持了稳定且可扩展的长周期推理能力。