重温整容手术假说: 基于大语言模型的修复 (Revisiting the Plastic Surgery Hypothesis via Large Language Models)

Automated Program Repair (APR) aspires to automatically generate patches for an input buggy program. Traditional APR tools typically focus on specific bug types and fixes through the use of templates, heuristics, and formal specifications. However, these techniques are limited in terms of the bug types and patch variety they can produce. As such, researchers have designed various learning-based APR tools with recent work focused on directly using Large Language Models (LLMs) for APR. While LLM-based APR tools are able to achieve state-of-the-art performance on many repair datasets, the LLMs used for direct repair are not fully aware of the project-specific information such as unique variable or method names. The plastic surgery hypothesis is a well-known insight for APR, which states that the code ingredients to fix the bug usually already exist within the same project. Traditional APR tools have largely leveraged the plastic surgery hypothesis by designing manual or heuristic-based approaches to exploit such existing code ingredients. However, as recent APR research starts focusing on LLM-based approaches, the plastic surgery hypothesis has been largely ignored. In this paper, we ask the following question: How useful is the plastic surgery hypothesis in the era of LLMs? Interestingly, LLM-based APR presents a unique opportunity to fully automate the plastic surgery hypothesis via fine-tuning and prompting. To this end, we propose FitRepair, which combines the direct usage of LLMs with two domain-specific fine-tuning strategies and one prompting strategy for more powerful APR. Our experiments on the widely studied Defects4j 1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs (substantially outperforming the best-performing baseline by 15 and 8), respectively, demonstrating a promising future of the plastic surgery hypothesis in the era of LLMs.

翻译：自动程序修复（APR）旨在自动为输入的有缺陷的程序生成补丁。传统的APR工具通常通过使用模板、启发式算法和正式规范来专注于具体的错误类型和修复。然而，这些技术在能够产生的错误类型和补丁变异性方面存在限制。因此，研究人员设计了各种基于学习的APR工具，最近的工作集中在直接使用大型语言模型（LLM）进行APR。虽然基于LLM的APR工具能够在许多修复数据集上实现最先进的性能，但用于直接修复的LLM并不完全了解项目特定信息，如唯一的变量或方法名称。整容手术假设是APR中的一种众所周知的洞见，它指出修复该漏洞的代码成分通常已经存在于同一项目中。传统的APR工具主要通过设计手动或启发式方法来利用这样现有的代码成分来利用整容手术假设。然而，随着最近的APR研究开始专注于基于LLM的方法，整容手术假说已经被大部分忽略。在本文中，我们提出了以下问题：在LLM时代，整容手术假说有多有用？有趣的是，基于LLM的APR为全自动化整容手术假说提供了独特的机会，通过微调和提示策略结合直接使用LLM。为此，我们提出了FitRepair，它将LLM的直接使用与两种领域特定微调策略和一种提示策略相结合，以实现更强大的APR。我们在广泛研究的Defects4j 1.2和2.0数据集上的实验表明，FitRepair修复了89个bug (表现远优于最佳的基准线15)以及44个bug (表现远优于最佳的基准线8)，展示了整容手术假说在LLM时代的有前途的未来。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

基于预训练语言模型的文本生成

专知会员服务

29+阅读 · 2022年1月28日

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

图解FixMatch的半监督学习，The Illustrated FixMatch for Semi-Supervised Learning

专知会员服务

26+阅读 · 2020年4月2日