Automatic program repair at project level may open yet to be seen opportunities in various fields of human activity. Since the SWE-Bench challenge was presented, we have seen numerous of solutions. Patch generation is a part of program repair, and test suite-based conversational patch generation has proven its effectiveness. However, the potential of conversational patch generation has not yet specifically estimated on SWE-Bench. This study reports experimental results aimed at evaluating the individual effectiveness of conversational patch generation on problems from SWE-Bench. The experiments show that a simple conversational pipeline based on LLaMA 3.1 70B can generate valid patches in 47\% of cases, which is comparable to the state-of-the-art in program repair on SWE-Bench.
翻译:暂无翻译