Automated Program Repair (APR) can help developers automatically generate patches for bugs. Due to the impressive performance obtained using Large Pre-Trained Language Models (LLMs) on many code related tasks, researchers have started to directly use LLMs for APR. However, prior approaches simply repeatedly sample the LLM given the same constructed input/prompt created from the original buggy code, which not only leads to generating the same incorrect patches repeatedly but also miss the critical information in testcases. To address these limitations, we propose conversational APR, a new paradigm for program repair that alternates between patch generation and validation in a conversational manner. In conversational APR, we iteratively build the input to the model by combining previously generated patches with validation feedback. As such, we leverage the long-term context window of LLMs to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test. We evaluate 10 different LLM including the newly developed ChatGPT model to demonstrate the improvement of conversational APR over the prior LLM for APR approach.
翻译:自动程序维修( APR) 可以帮助开发者自动生成错误的补丁。 由于使用大型培训前语言模型( LLMs)在许多代码相关任务上取得了令人印象深刻的绩效, 研究人员已经开始直接使用 LLMs 。 然而, 先前的方法只是反复抽样LM, 因为最初的错误代码所创建的输入/ 提示是相同的, 不仅导致反复生成同样的不正确的补丁, 而且还错过了测试框中的关键信息。 为了解决这些限制, 我们建议对话式的ARPR, 这是一种在补丁生成和验证之间以谈话方式替换的方案修复新模式。 在对话式的ARPRA中, 我们通过将先前生成的补丁与验证反馈结合起来, 来交互构建对模型的投入。 因此, 我们利用LMs的长期环境窗口不仅避免生成先前不正确的补丁, 而且还纳入验证反馈, 以帮助模型理解测试中的程序的语义含义。 我们评估了10种不同的LM 不同的 LLM, 包括新开发的 CHATGPT 模型, 以显示对先前的 LLM PRARC 的改进 。