Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques. APR tools such as TFix and CodeXGLUE combine text-to-text transformers with software-specific techniques are outperforming alternatives, these days. However, in most APR studies the train and test sets are chosen from the same set of projects. In reality, however, APR models are meant to be generalizable to new and different projects. Therefore, there is a potential threat that reported APR models with high effectiveness perform poorly when the characteristics of the new project or its bugs are different than the training set's(Domain Shift). In this study, we first define and measure the domain shift problem in automated program repair. Then, we then propose a domain adaptation framework that can adapt an APR model for a given target project. We conduct an empirical study with three domain adaptation methods FullFineTuning, TuningWithLightWeightAdapterLayers, and CurriculumLearning using two state-of-the-art domain adaptation tools (TFix and CodeXGLUE) and two APR models on 611 bugs from 19 projects. The results show that our proposed framework can improve the effectiveness of TFix by 13.05% and CodeXGLUE by 23.4%. Another contribution of this study is the proposal of a data synthesis method to address the lack of labelled data in APR. We leverage transformers to create a bug generator model. We use the generated synthetic data to domain adapt TFix and CodeXGLUE on the projects with no data (Zero-shot learning), which results in an average improvement of 5.76% and 24.42% for TFix and CodeXGLUE, respectively.
翻译:自动程序修理( APR) 被定义为用自动工具来修正源代码中的错误/缺陷的过程。 RAPR 工具最近通过使用最新科技神经语言处理( NLP) 技术, 取得了大有希望的成果。 如 TFix 和 CodXGLUE 等工具将文本到文本变换器与软件特定技术相结合, 这些天。 然而, 在大多数 RA 研究中, 火车和测试组是从同一个项目组中选择的。 然而, 实际上, APR 模型是用来向新的和不同的项目推广的。 因此, 当新项目或其错误的特性不同于培训集( Domain Shift) 时, RA 工具最近出现了一个潜在的威胁: TFix 和 CodeGLLUE 工具中, 我们首先在自动程序修理中定义和测量域代码变换变换问题。 我们用三个域变换方法来进行实验性研究。 Flotfine Turning- LightLaightLayter Z, 和 CechLLLLADLADF 分别使用两个数据格式的变换方法, 我们的代码LDUEFIDUE 和变换了一个数据项目。 和变换了RDFTFTFTFTFTF.