Automated program repair using neural models has shown promising results on benchmark datasets, yet practical deployment remains limited. In this study, we examine whether a small transformer model can meaningfully repair real-world Java bugs and whether syntactic correctness is a reliable proxy for semantic correctness. We fine-tune CodeT5-small (60.5M parameters) on 52,364 Java bug-fix pairs from CodeXGLUE and evaluate both token-level performance and syntactic validity using AST parsing. While the model converges cleanly and achieves high grammatical correctness, producing syntactically valid Java code in approximately ninety-four percent of cases, it fails to generate correct repairs under exact-match evaluation, achieving zero exact matches. In approximately eighty percent of cases, the model reproduces the buggy input verbatim.
翻译:基于神经模型的自动化程序修复在基准数据集上已展现出有希望的结果,但实际部署仍有限。在本研究中,我们探究小型Transformer模型能否有效修复现实世界中的Java程序错误,以及语法正确性是否可作为语义正确性的可靠代理指标。我们在CodeXGLUE数据集的52,364个Java错误修复对上对CodeT5-small(6050万参数)进行微调,并使用抽象语法树解析同时评估词元级性能和语法有效性。尽管模型能够稳定收敛并实现较高的语法正确率——约94%的情况下能生成语法有效的Java代码——但在精确匹配评估中未能生成正确的修复结果,精确匹配率为零。约80%的情况下,模型会逐字复现错误的输入代码。