High-quality computer science education is limited by the difficulty of providing instructor feedback to students at scale. While this feedback could in principle be automated, supervised approaches to predicting the correct feedback are bottlenecked by the intractability of annotating large quantities of student code. In this paper, we instead frame the problem of providing feedback as few-shot classification, where a meta-learner adapts to give feedback to student code on a new programming question from just a few examples annotated by instructors. Because data for meta-training is limited, we propose a number of amendments to the typical few-shot learning framework, including task augmentation to create synthetic tasks, and additional side information to build stronger priors about each task. These additions are combined with a transformer architecture to embed discrete sequences (e.g. code) to a prototypical representation of a feedback class label. On a suite of few-shot natural language processing tasks, we match or outperform state-of-the-art performance. Then, on a collection of student solutions to exam questions from an introductory university course, we show that our approach reaches an average precision of 88% on unseen questions, surpassing the 82% precision of teaching assistants. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university. This is, to the best of our knowledge, the first successful deployment of a machine learning based feedback to open-ended student code.
翻译:高品质的计算机科学教育因难以向学生提供规模化的教官反馈而受到限制。 虽然原则上,这种反馈可以自动化, 但预测正确反馈的监督方法却因大量学生代码的可忽略性而受阻。 在本文中,我们将提供反馈的问题设置为微小分类, 由一位元左轮导师根据教员的附加说明, 从几个例子中, 向学生提供对新编程问题的反馈。 由于元培训数据有限, 我们提议对典型的少见学习框架进行若干修正, 包括任务扩大以创建合成任务, 以及增加额外信息以建立关于每项任务的更强有力的前科。 这些添加与变异器结构相结合, 将离散序列( 如代码) 嵌入一个反馈类的原型表示。 在少数的自然语言处理任务中, 我们匹配或超越了最新水平的成绩。 然后, 在收集学生对入门大学课程的考试问题的解决方案方面, 我们展示了我们的方法达到了88%的平均精确度, 通过测试, 我们提供了一个成功的学习等级的821级的考试, 提供了一个成功的大学课程的精确度。