Automatic Program Repair (APR) aims at fixing buggy source code with less manual debugging efforts, which plays a vital role in improving software reliability and development productivity. Recent APR works have achieved remarkable progress via applying deep learning (DL), particularly neural machine translation (NMT) techniques. However, we observe that existing DL-based APR models suffer from at least two severe drawbacks: (1) Most of them can only generate patches for a single programming language, as a result, to repair multiple languages, we have to build and train many repairing models. (2) Most of them are developed in an offline manner. Therefore, they won't function when there are new-coming requirements. To address the above problems, a T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely \emph{C}ont\emph{I}nual \emph{R}epair a\emph{C}ross Programming \emph{L}anguag\emph{E}s (\emph{CIRCLE}). Specifically, (1) CIRCLE utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR. (2) CIRCLE adopts a difficulty-based rehearsal strategy to achieve lifelong learning for APR without access to the full historical data. (3) An elastic regularization method is employed to strengthen CIRCLE's continual learning ability further, preventing it from catastrophic forgetting. (4) CIRCLE applies a simple but effective re-repairing method to revise generated errors caused by crossing multiple programming languages. We train CIRCLE for four languages (i.e., C, JAVA, JavaScript, and Python) and evaluate it on five commonly used benchmarks. The experimental results demonstrate that CIRCLE not only effectively and efficiently repairs multiple programming languages in continual learning settings, but also achieves state-of-the-art performance with a single repair model.
翻译:自动程序修理( APR) 旨在用较少人工调试的努力来修补错误源码, 这在提高软件的可靠性和发展生产率方面起着至关重要的作用。 最近的 PRRA 工作通过应用深层学习( DL), 特别是神经机翻译( NMT) 技术, 取得了显著的进展。 然而, 我们观察到, 基于 DL 的 PRA 模式至少有两个严重的缺陷:(1) 多数模式只能为单一编程语言生成补丁, 从而修复多种语言。 (2) 多数模式是以离线方式开发的。 因此, 当有新的需求时, 它们将无法发挥功能。 为了解决上述问题, 提议采用基于 T5 的 RA 框架, 配备了多种程序翻译能力, 即 emph{C} { I}, 以 empleph{C} 语言生成补补补补补补补补补补补, 使用 IMLELELER IMLE IMLA 。 (creal) 具体地, 使用 CREARC IMLIL IML IML IML IML IMLE IMLE, 和 IML IMLE IMLE IML IML IM IML IM IM IM IM 。 ( IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP 。