大型未受过培训语言模式时代的实用方案维修 (Practical Program Repair in the Era of Large Pre-trained Language Models)

Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates or directly predict potential patches. Large Pre-Trained Language Models (PLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged PLMs for APR without relying on any bug-fixing datasets. Meanwhile, such existing work either failed to include state-of-the-art PLMs or was not evaluated on realistic datasets. In this work, we perform the first extensive study on directly applying PLMs for APR. We select 9 recent state-of-the-art PLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use PLMs to generate patches. We apply the PLMs under these repair settings on 5 datasets across 3 different languages and compare different PLMs in the number of bugs fixed, generation speed and compilation rate. Our study demonstrates that directly applying state-of-the-art PLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied PLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the PLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking.

翻译：自动程序修补( APR) 旨在帮助开发者自动修补软件错误。然而, 目前最先进的传统和基于学习的传统 PRRA 技术面临着有限的补丁品种问题, 无法修补复杂的错误。这主要是由于依赖错误修补数据集来修补模板或直接预测可能的补补补。大型的预加工语言模型( PLM), 使用数十亿文本/ 代码符号培训, 可以帮助避免这一问题。最近, 研究人员直接为 PRA 调用 PLM 直接将 PLM 调用在 PLM 上, 无需依赖任何错误修补数据集。与此同时, 这样的现有工作要么没有包括最新工艺的PLM, 或者没有在现实的数据集上对其进行评估。在这项工作中, 我们对直接应用 PLMS 数据元数据集进行第一次广泛的研究, 我们选择了9个最新版本的PLM, 包括调试和填模型, 从125M 到20B 。我们设计了3 的补补补补补补补设置了3 。我们所有的 PLMS- make 都用不同的解算方法, 我们用在5 里用更高级的机机中, 也用更精确的校用更精确的代算方法。