智能体重构：AI编码智能体的实证研究 (Agentic Refactoring: An Empirical Study of AI Coding Agents)

Agentic coding tools, such as OpenAI Codex, Claude Code, and Cursor, are transforming the software engineering landscape. These AI-powered systems function as autonomous teammates capable of planning and executing complex development tasks. Agents have become active participants in refactoring, a cornerstone of sustainable software development aimed at improving internal code quality without altering observable behavior. Despite their increasing adoption, there is a critical lack of empirical understanding regarding how agentic refactoring is utilized in practice, how it compares to human-driven refactoring, and what impact it has on code quality. To address this empirical gap, we present a large-scale study of AI agent-generated refactorings in real-world open-source Java projects, analyzing 15,451 refactoring instances across 12,256 pull requests and 14,988 commits derived from the AIDev dataset. Our empirical analysis shows that refactoring is a common and intentional activity in this development paradigm, with agents explicitly targeting refactoring in 26.1% of commits. Analysis of refactoring types reveals that agentic efforts are dominated by low-level, consistency-oriented edits, such as Change Variable Type (11.8%), Rename Parameter (10.4%), and Rename Variable (8.5%), reflecting a preference for localized improvements over the high-level design changes common in human refactoring. Additionally, the motivations behind agentic refactoring focus overwhelmingly on internal quality concerns, with maintainability (52.5%) and readability (28.1%). Furthermore, quantitative evaluation of code quality metrics shows that agentic refactoring yields small but statistically significant improvements in structural metrics, particularly for medium-level changes, reducing class size and complexity (e.g., Class LOC median $\Delta$ = -15.25).

翻译：智能体编码工具，如OpenAI Codex、Claude Code和Cursor，正在重塑软件工程领域。这些AI驱动的系统作为自主协作伙伴，能够规划并执行复杂的开发任务。在重构——这一旨在提升内部代码质量而不改变可观测行为的可持续软件开发基石——过程中，智能体已成为积极的参与者。尽管其应用日益广泛，但关于智能体重构在实践中的使用方式、与人工驱动重构的对比及其对代码质量的影响，仍严重缺乏实证理解。为填补这一实证空白，我们基于AIDev数据集，对真实世界开源Java项目中AI智能体生成的重构进行了大规模研究，分析了来自12,256个拉取请求和14,988次提交的15,451个重构实例。实证分析表明，重构是该开发范式中常见且有意为之的活动，智能体在26.1%的提交中明确以重构为目标。重构类型分析显示，智能体的重构工作主要由低层次、面向一致性的编辑主导，例如变更变量类型（11.8%）、重命名参数（10.4%）和重命名变量（8.5%），这反映出其偏好局部改进而非人工重构中常见的高层次设计变更。此外，智能体重构的动机绝大多数集中于内部质量考量，其中可维护性（52.5%）和可读性（28.1%）占主导。进一步地，代码质量指标的定量评估表明，智能体重构在结构指标上带来了虽小但统计显著的改进，尤其对于中等程度变更，有效降低了类规模与复杂度（例如类LOC中位数变化Δ = -15.25）。