Software debugging, and program repair are among the most time-consuming and labor-intensive tasks in software engineering that would benefit a lot from automation. In this paper, we propose a novel automated program repair approach based on CodeBERT, which is a transformer-based neural architecture pre-trained on large corpus of source code. We fine-tune our model on the ManySStuBs4J small and large datasets to automatically generate the fix codes. The results show that our technique accurately predicts the fixed codes implemented by the developers in 19-72% of the cases, depending on the type of datasets, in less than a second per bug. We also observe that our method can generate varied-length fixes (short and long) and can fix different types of bugs, even if only a few instances of those types of bugs exist in the training dataset.
翻译:软件调试以及程序维修是最费时和劳力密集型的软件工程任务之一,这将会从自动化中获益。 在本文中,我们提议了一种基于代码BERT的新颖的自动化程序修理方法。 代码BERT是一个基于变压器的神经结构,在大量源代码中先受过训练。 我们用MenySStuBs4J小和大数据集微调我们的模型,以自动生成固定代码。 结果表明,我们的技术准确地预测了开发者在19- 72%的案例中实施的固定代码,取决于数据集的类型,每错误不到二分之一。 我们还观察到,我们的方法可以产生不同长度的修复(短长),并且可以修复不同类型的错误,即使培训数据集中只存在少量这类错误。