Learning-based program repair has achieved good results in a recent series of papers. Yet, we observe that the related work fails to repair some bugs because of a lack of knowledge about 1) the application domain of the program being repaired, and 2) the fault type being repaired. In this paper, we solve both problems by changing the learning paradigm from supervised training to self-supervised training in an approach called SelfAPR. First, SelfAPR generates training samples on disk by perturbing a previous version of the program being repaired, enforcing the neural model to capture projectspecific knowledge. This is different from the previous work based on mined past commits. Second, SelfAPR executes all training samples and extracts and encodes test execution diagnostics into the input representation, steering the neural model to fix the kind of fault. This is different from the existing studies that only consider static source code as input. We implement SelfAPR and evaluate it in a systematic manner. We generate 1 039 873 training samples obtained by perturbing 17 open-source projects. We evaluate SelfAPR on 818 bugs from Defects4J, SelfAPR correctly repairs 110 of them, outperforming all the supervised learning repair approaches.
翻译:在最近一系列论文中,基于学习的方案修复工作取得了良好的成果。然而,我们注意到,相关工作未能修复某些错误,原因是对以下几个方面缺乏了解:(1) 正在修复的方案的应用领域,(2) 正在修复的故障类型。在本文件中,我们通过将学习范式从监督培训改变为以名为SelfAPR的方法进行自我监督的培训,解决了这两个问题。首先,SelfAPR在磁盘上生成培训样本,对正在修复的方案的前版本进行干扰,强制实施神经模型,以获取具体项目知识。这与以往基于过去埋设地雷的工作不同。第二,SelfAPR执行所有培训样本和摘录,并将测试执行诊断的测试诊断编码输入演示,引导神经模型来修正过失类型。这不同于现有的仅将静态源代码视为投入的研究。我们实施自毁程序并系统评估它。我们生成了17个公开源项目所获取的1 039 873个培训样本。我们评估了818个来自Deffects4J的错误的SelfAPR。