The advances of large language models (LLMs) have paved the way for automated software vulnerability repair approaches, which iteratively refine the patch until it becomes plausible. Nevertheless, existing LLM-based vulnerability repair approaches face notable limitations: 1) they ignore the concern of locations that need to be patched and focus solely on the repair content. 2) they lack quality assessment for generated candidate patches in the iterative process. To tackle the two limitations, we propose \sysname, an LLM-based approach that provides information about where should be patched first. Furthermore, \sysname improves the iterative repair strategy by assessing the quality of test-failing patches and selecting the best patch for the next iteration. We introduce two dimensions to assess the quality of patches: whether they introduce new vulnerabilities and the taint statement coverage. We evaluated \sysname on a real-world C/C++ vulnerability repair dataset VulnLoc+, which contains 40 vulnerabilities and their Proofs-of-Vulnerability. The experimental results demonstrate that \sysname exhibits substantial improvements compared with the Neural Machine Translation-based, Program Analysis-based, and LLM-based state-of-the-art vulnerability repair approaches. Specifically, \sysname is able to generate 27 plausible patches, which is comparable to or even 8 to 22 more plausible patches than the baselines. In terms of correct patch generation, \sysname repairs 8 to 13 additional vulnerabilities compared with existing approaches.
翻译:大型语言模型(LLM)的进展为自动化软件漏洞修复方法铺平了道路,这些方法通过迭代优化补丁直至其变得合理。然而,现有的基于LLM的漏洞修复方法面临显著局限:1)它们忽略了需要修补的位置信息,仅关注修复内容;2)在迭代过程中缺乏对生成候选补丁的质量评估。为应对这两点局限,我们提出\sysname,一种基于LLM的方法,其首先提供关于何处应被修补的信息。此外,\sysname通过评估测试失败补丁的质量并选择最佳补丁进入下一轮迭代,从而改进了迭代修复策略。我们引入两个维度来评估补丁质量:是否引入新漏洞以及污点语句覆盖率。我们在真实世界的C/C++漏洞修复数据集VulnLoc+上评估了\sysname,该数据集包含40个漏洞及其漏洞证明。实验结果表明,与基于神经机器翻译、基于程序分析以及基于LLM的最先进漏洞修复方法相比,\sysname展现出显著改进。具体而言,\sysname能够生成27个合理补丁,与基线方法相当甚至多出8至22个合理补丁。在正确补丁生成方面,\sysname比现有方法多修复了8至13个漏洞。