利用方案拟订过程数据中的可解释模式进行早期业绩预测 (Early Performance Prediction using Interpretable Patterns in Programming Process Data)

Instructors have limited time and resources to help struggling students, and these resources should be directed to the students who most need them. To address this, researchers have constructed models that can predict students' final course performance early in a semester. However, many predictive models are limited to static and generic student features (e.g. demographics, GPA), rather than computing-specific evidence that assesses a student's progress in class. Many programming environments now capture complete time-stamped records of students' actions during programming. In this work, we leverage this rich, fine-grained log data to build a model to predict student course outcomes. From the log data, we extract patterns of behaviors that are predictive of students' success using an approach called differential sequence mining. We evaluate our approach on a dataset from 106 students in a block-based, introductory programming course. The patterns extracted from our approach can predict final programming performance with 79% accuracy using only the first programming assignment, outperforming two baseline methods. In addition, we show that the patterns are interpretable and correspond to concrete, effective -- and ineffective -- novice programming behaviors. We also discuss these patterns and their implications for classroom instruction.

翻译：教官的时间和资源有限, 帮助挣扎的学生, 这些资源应该用于最需要的学生。为了解决这个问题, 研究人员已经建立了模型, 可以预测学生在一学期早期的最后课程成绩。但是, 许多预测模型仅限于静态和普通学生特点( 如人口学、 GPA), 而不是评估学生在课堂上进步的计算机特定证据。许多编程环境现在捕捉了学生在编程期间行动的全部时间标记记录。在这项工作中, 我们利用这个丰富的精细细的日志数据来构建一个模型来预测学生课程结果。从日志数据中, 我们用一种叫作差异序列采矿的方法, 提取预测学生成功成绩的行为模式。我们从一个基于街区的介绍性编程课程中评估106名学生的数据组合的方法。从我们的方法中提取的模式可以预测79%的最后编程成绩, 仅使用第一个编程任务, 优于两种基线方法。此外, 我们显示这些模式是可以解释的, 符合具体、有效 - 无效 - 无效的编程行为。我们还讨论这些模式及其对课堂教学的意义。