Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using the original buggy code and then queries the LLM to generate patches. While the LLM-based APR tools are able to achieve state-of-the-art results, it still follows the classic Generate and Validate repair paradigm of first generating lots of patches and then validating each one afterwards. This not only leads to many repeated patches that are incorrect but also miss the crucial information in test failures as well as in plausible patches. To address these limitations, we propose ChatRepair, the first fully automated conversation-driven APR approach that interleaves patch generation with instant feedback to perform APR in a conversational style. ChatRepair first feeds the LLM with relevant test failure information to start with, and then learns from both failures and successes of earlier patching attempts of the same bug for more powerful APR. For earlier patches that failed to pass all tests, we combine the incorrect patches with their corresponding relevant test failure information to construct a new prompt for the LLM to generate the next patch. In this way, we can avoid making the same mistakes. For earlier patches that passed all the tests, we further ask the LLM to generate alternative variations of the original plausible patches. In this way, we can further build on and learn from earlier successes to generate more plausible patches to increase the chance of having correct patches. While our approach is general, we implement ChatRepair using state-of-the-art dialogue-based LLM -- ChatGPT. By calculating the cost of accessing ChatGPT, we can fix 162 out of 337 bugs for \$0.42 each!
翻译:自动程序修复(APR)旨在自动生成有错误的程序的修补程序。最近的APR工作集中在利用现代的大型语言模型(LLM)直接生成APR的补丁。这种基于LLM的APR工具的工作方式是首先使用原始的有错误代码构建输入提示,然后查询LLM以生成补丁。虽然LLM基础的APR工具能够取得最先进的结果,但它仍然采用先生成大量的补丁,然后进行验证的经典生成和验证修复范式。这不仅会导致许多不正确的重复补丁,而且会错过测试失败和可行补丁中的关键信息。为了解决这些限制,我们提出了ChatRepair,这是一种全自动的、对话驱动的APR方法,它使用即时反馈来以对话方式执行APR。ChatRepair首先提供相关的测试失败信息,以便开始使用LLM,然后从同一错误的早期修补尝试的失败和成功中学习,以获得更强大的APR。对于早期未能通过所有测试的修补程序,我们将不正确的修补程序与相应的相关测试失败信息结合起来,构建一个新的提示,用于生成下一个补丁。通过这种方法,我们可以避免犯同样的错误。对于早期通过了所有测试的修补程序,我们进一步要求LLM生成原始可行补丁的替代变体。通过这种方式,我们可以进一步建立和学习以生成更有可能正确的补丁,从而增加获得正确补丁的机会。虽然我们的方法是通用的,但我们使用最先进的基于对话的LLM ChatGPT实现了ChatRepair。通过计算访问ChatGPT的成本,我们可以以每个错误成本仅为0.42美元的成本修复337个错误中的162个!