Large language models trained on code (LLMCs), such as Codex, hold great promise in enhancing programming education by automatically generating feedback for students. We investigate using LLMCs to generate feedback for fixing syntax errors in Python programs, a key scenario in introductory programming. More concretely, given a student's buggy program, our goal is to generate feedback comprising a fixed program along with a natural language explanation describing the errors/fixes, inspired by how a human tutor would give feedback. While using LLMCs is promising, the critical challenge is to ensure high precision in the generated feedback, which is imperative before deploying such technology in classrooms. The main research question we study is: Can we develop LLMCs-based feedback generation techniques with a tunable precision parameter, giving educators quality control over the feedback that students receive? To this end, we introduce PyFiXV, our technique to generate high-precision feedback powered by Codex. The key idea behind PyFiXV is to use a novel run-time validation mechanism to decide whether the generated feedback is suitable for sharing with the student; notably, this validation mechanism also provides a precision knob to educators. We perform an extensive evaluation using two real-world datasets of Python programs with syntax errors and show the efficacy of PyFiXV in generating high-precision feedback.
翻译:在代码(LLMCs)方面受过培训的大型语言模型(LLMCs),如Codx,在通过自动为学生提供反馈来增强编程教育方面大有希望。我们调查使用LLMCs为修复Python方案中的语法错误提供反馈,这是介绍性编程中的一个关键情景。更具体地说,鉴于学生的错误程序,我们的目标是产生反馈,包括固定程序以及描述错误/修补的自然语言解释,并受一个人类导师如何提供反馈的启发。虽然使用LLMMCs很有希望,但关键的挑战是确保生成的反馈的高度精确性,这是在课堂上部署这类技术之前必不可少的。我们研究的主要问题是:我们能否开发基于LLMMCs(LMCs)的反馈生成技术,并配有可调试的精确度参数,对学生收到的反馈给予教育者质量控制?为此,我们引入PyFiXV,这是我们生成高精度反馈的技术,并受 Codexx的启发。PyFiXV的关键思想是使用新型的运行时间验证机制来决定生成的反馈是否适合与学生分享;特别是,这个验证机制还原的精确数据系统还向教育者进行广泛的评估。我们使用两个精确的反馈程序。