Real bug fixes found in open source repositories seem to be the perfect source for learning to localize and repair real bugs. However, the absence of large scale bug fix collections has made it difficult to effectively exploit real bug fixes in the training of larger neural models in the past. In contrast, artificial bugs -- produced by mutating existing source code -- can be easily obtained at a sufficient scale and are therefore often preferred in the training of existing approaches. Still, localization and repair models that are trained on artificial bugs usually underperform when faced with real bugs. This raises the question whether bug localization and repair models trained on real bug fixes are more effective in localizing and repairing real bugs. We address this question by introducing RealiT, a pre-train-and-fine-tune approach for effectively learning to localize and repair real bugs from real bug fixes. RealiT is first pre-trained on a large number of artificial bugs produced by traditional mutation operators and then fine-tuned on a smaller set of real bug fixes. Fine-tuning does not require any modifications of the learning algorithm and hence can be easily adopted in various training scenarios for bug localization or repair (even when real training data is scarce). In addition, we found that training on real bug fixes with RealiT is empirically powerful by nearly doubling the localization performance of an existing model on real bugs while maintaining or even improving the repair performance.
翻译:在开放源码库中发现的真正的错误修正似乎是学习本地化和修复真实错误的完美来源。 但是,由于没有大规模错误修正收藏,因此很难在培训大型神经模型的过程中有效地利用真正的错误修正方法。 相反,人工错误 -- -- 由变换现有源代码生成 -- -- 能够以足够大的规模轻易获得,因此在培训现有方法时往往更可取。然而,在人造错误方面受过训练的定位和修复模型在面对真错误时通常不完善。这提出了在实际错误修正方面受过训练的错误本地化和修复模型是否在本地化和修复实际错误方面更加有效的问题。我们通过引入ReeliT来解决这一问题,这是一种通过对现有错误修正有效学习本地化和修复真实错误的预先行方法。RealiT首先对由传统的突变操作者生成的大量人工错误进行了预先培训,然后对更小的错误修正进行微调整。 精细的调整并不要求修改真正的错误校正算方法,因此,在实际培训中可以很容易地进行真正的修正。