Students often make mistakes on their introductory programming assignments as part of their learning process. Unfortunately, providing custom repairs for these mistakes can require a substantial amount of time and effort from class instructors. Automated program repair (APR) techniques can be used to synthesize such fixes. Prior work has explored the use of symbolic and neural techniques for APR in the education domain. Both types of approaches require either substantial engineering efforts or large amounts of data and training. We propose to use a large language model trained on code, such as Codex, to build an APR system -- MMAPR -- for introductory Python programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate MMAPR on 286 real student programs and compare to a baseline built by combining a state-of-the-art Python syntax repair engine, BIFI, and state-of-the-art Python semantic repair engine for student assignments, Refactory. We find that MMAPR can fix more programs and produce smaller patches on average.
翻译:作为学习过程的一部分,学生往往在入门编程任务中犯错误。不幸的是,为这些错误提供定制修理可能需要班级教官花费大量的时间和精力。可以使用自动化程序修理技术来合成这些修补方法。以前的工作已经探索了在教育领域为同行审议机构使用象征性和神经技术。两种方法都需要大量的工程努力或大量的数据和培训。我们提议使用一个在代码方面受过培训的大型语言模型,如代码编码学,以建立RA系统 -- -- MAPR -- -- 用于入门性皮约编程任务。我们的系统可以通过将多式提示、反复查询、对几张照片的测试性案例选择和方案块块组合在一起,来修正合成和语义错误。我们评估了MMAPR的286个实际学生方案,并与一个通过将最先进的Python语系修复引擎、BIFI和最先进的Python语系修复引擎相结合而建立的基线比较。我们发现,MMAPR可以对学生任务进行更小型的修补。