Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive open-source code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}.
翻译:自动程序修理(APR)旨在自动修正软件错误,并在软件开发和维护中起到关键作用。由于最近在深层学习(DL)方面的进展,越来越多的PRA技术被提议利用神经网络从大规模开放源代码库中学习错误修正模式。这种基于学习的技术通常将RAA视为神经机器翻译(NMT)任务,其中错误的代码片段(即源语言)被自动翻译成固定代码片段(即,目标语言),受益于DL从先前的错误修正技术中学习隐藏关系的强大能力(DL),学习的PRA技术已经取得了显著的成绩。在本文件中,我们提供系统调查,总结以学习为基础的PRA社区中的最新研究。我们展示了基于学习的RA技术的一般工作流程,并详细介绍了关键组成部分,包括错误本地化、补丁生成、补丁排序、补丁验证和补丁阶段。我们随后讨论了广泛采用的数据集和评价指标,并概述了现有的ARA技术的实用应用情况。我们利用了一些关键方面,作为ARAA系统数据库的学习基础,我们利用了现有的一些关键方面来进行实地科学研究。我们现有的数据库。