Automated program repair is the task of automatically repairing software bugs. A promising direction in this field is self-supervised learning, a learning paradigm in which repair models are trained without commits representing pairs of bug/fix. In self-supervised neural program repair, those bug/fix pairs are generated in some ways. The main problem is to generate interesting and diverse pairs that maximize the effectiveness of training. As a contribution to this problem, we propose to use back-translation, a technique coming from neural machine translation. We devise and implement MUFIN, a back-translation training technique for program repair, with specifically designed code critics to select high-quality training samples. Our results show that MUFIN's back-translation loop generates valuable training samples in a fully automated, self-supervised manner, generating more than half-a-million pairs of bug/fix. The code critic design is key because of a fundamental trade-off between how restrictive a critic is and how many samples are available for optimization during back-translation.
翻译:自动程序修复是自动修复软件漏洞的任务,自我监督学习是该领域的一个有前途的方向,其学习范式是在不进行 bug/fix 套装匹配的情况下训练修复模型。在自我监督神经元修复中,使用某些方法生成那些 bug/fix 对。主要问题在于生成有趣和多样化的对,以最大程度地提高训练效果。作为对这一问题的贡献,我们建议使用来自神经机器翻译的后翻译技术。我们设计并实现了一个后翻译训练技术 MUFIN,其具有专门设计的代码批评家来选择高质量的训练样本。我们的结果表明,MUFIN 的后翻译循环以完全自动化、自我监督的方式生成有价值的训练样本,生成了超过 50 万对 bug/fix。代码批评家的设计非常关键,因为在后翻译优化期间,批评家的限制性和样本数量之间存在根本的权衡。