Atomic commits, which address a single development concern, are a best practice in software development. In practice, however, developers often produce tangled commits that mix unrelated changes, complicating code review and maintenance. Prior untangling approaches (rule-based, feature-based, or graph-based) have made progress but typically rely on shallow signals and struggle to distinguish explicit dependencies (e.g., control/data flow) from implicit ones (e.g., semantic or conceptual relationships). In this paper, we propose ColaUntangle, a new collaborative consultation framework for commit untangling that models both explicit and implicit dependencies among code changes. ColaUntangle integrates Large Language Model (LLM)-driven agents in a multi-agent architecture: one agent specializes in explicit dependencies, another in implicit ones, and a reviewer agent synthesizes their perspectives through iterative consultation. To capture structural and contextual information, we construct Explicit and Implicit Contexts, enabling agents to reason over code relationships with both symbolic and semantic depth. We evaluate ColaUntangle on two widely-used datasets (1,612 C# and 14k Java tangled commits). Experimental results show that ColaUntangle outperforms the best-performing baseline, achieving an improvement of 44% on the C# dataset and 82% on the Java dataset. These findings highlight the potential of LLM-based collaborative frameworks for advancing automated commit untangling tasks.
翻译:原子提交(即仅处理单一开发问题的提交)是软件开发中的最佳实践。然而在实际中,开发者常产生混杂提交,其中包含不相关的代码变更,这增加了代码审查与维护的复杂性。现有的解缠方法(基于规则、特征或图结构)已取得一定进展,但通常依赖浅层信号,且难以区分显式依赖(如控制流/数据流)与隐式依赖(如语义或概念关联)。本文提出ColaUntangle,一种用于提交解缠的协作式咨询框架,该框架同时对代码变更中的显式与隐式依赖进行建模。ColaUntangle将大语言模型驱动的智能体集成于多智能体架构中:一个智能体专精于显式依赖分析,另一个专注于隐式依赖识别,并通过评审智能体在迭代咨询中综合双方观点。为捕捉结构与上下文信息,我们构建了显式与隐式上下文,使智能体能结合符号化与语义化深度对代码关系进行推理。我们在两个广泛使用的数据集(1,612个C#混杂提交与14k个Java混杂提交)上评估ColaUntangle。实验结果表明,ColaUntangle优于现有最佳基线方法,在C#数据集上性能提升44%,在Java数据集上提升82%。这些发现凸显了基于大语言模型的协作框架在推进自动化提交解缠任务方面的潜力。