面向问题的基于智能体的自动化评审意见生成框架 (Issue-Oriented Agent-Based Framework for Automated Review Comment Generation)

Code review (CR) is a crucial practice for ensuring software quality. Various automated review comment generation techniques have been proposed to streamline the labor-intensive process. However, existing approaches heavily rely on a single model to identify various issues within the code, limiting the model's ability to handle the diverse, issue-specific nature of code changes and leading to non-informative comments, especially in complex scenarios such as bug fixes. To address these limitations, we propose RevAgent, a novel agent-based issue-oriented framework, decomposes the task into three stages: (1) Generation Stage, where five category-specific commentator agents analyze code changes from distinct issue perspectives and generate candidate comments; (2) Discrimination Stage, where a critic agent selects the most appropriate issue-comment pair; and (3) Training Stage, where all agents are fine-tuned on curated, category-specific data to enhance task specialization. Evaluation results show that RevAgent significantly outperforms state-of-the-art PLM- and LLM-based baselines, with improvements of 12.90\%, 10.87\%, 6.32\%, and 8.57\% on BLEU, ROUGE-L, METEOR, and SBERT, respectively. It also achieves relatively higher accuracy in issue-category identification, particularly for challenging scenarios. Human evaluations further validate the practicality of RevAgent in generating accurate, readable, and context-aware review comments. Moreover, RevAgent delivers a favorable trade-off between performance and efficiency.

翻译：代码评审（CR）是确保软件质量的关键实践。为简化这一劳动密集型过程，已提出多种自动化评审意见生成技术。然而，现有方法严重依赖单一模型来识别代码中的各类问题，限制了模型处理多样化、问题特定的代码变更的能力，导致生成非信息性评论，尤其在错误修复等复杂场景中。为应对这些局限，我们提出RevAgent——一种新颖的基于智能体的面向问题框架，将任务分解为三个阶段：（1）生成阶段：五个针对特定类别的评论员智能体从不同问题视角分析代码变更并生成候选评论；（2）判别阶段：一个批评家智能体选择最合适的问题-评论配对；（3）训练阶段：所有智能体在精心筛选的类别特定数据上进行微调，以增强任务专长。评估结果表明，RevAgent显著优于基于PLM和LLM的先进基线方法，在BLEU、ROUGE-L、METEOR和SBERT指标上分别提升12.90%、10.87%、6.32%和8.57%。该框架在问题类别识别方面也实现了相对更高的准确率，尤其在具有挑战性的场景中。人工评估进一步验证了RevAgent在生成准确、可读且上下文感知的评审意见方面的实用性。此外，RevAgent在性能与效率之间实现了良好的平衡。