Time series is the most prevalent form of input data for educational prediction tasks. The vast majority of research using time series data focuses on hand-crafted features, designed by experts for predictive performance and interpretability. However, extracting these features is labor-intensive for humans and computers. In this paper, we propose an approach that utilizes irregular multivariate time series modeling with graph neural networks to achieve comparable or better accuracy with raw time series clickstreams in comparison to hand-crafted features. Furthermore, we extend concept activation vectors for interpretability in raw time series models. We analyze these advances in the education domain, addressing the task of early student performance prediction for downstream targeted interventions and instructional support. Our experimental analysis on 23 MOOCs with millions of combined interactions over six behavioral dimensions show that models designed with our approach can (i) beat state-of-the-art educational time series baselines with no feature extraction and (ii) provide interpretable insights for personalized interventions. Source code: https://github.com/epfl-ml4ed/ripple/.
翻译:时间序列是教育预测任务最常用的投入数据形式。 绝大多数使用时间序列数据的研究侧重于由专家为预测性能和可解释性而设计的手工制作的特征。 然而,提取这些特征对于人类和计算机来说是劳动密集型的。 在本文中,我们建议采用一种方法,利用图形神经网络的不规则多变时间序列模型,用原始时间序列点击流实现可比或更好的准确性,与手工制作的特征相比。 此外,我们扩展了原始时间序列模型中可解释性的概念激活矢量。我们分析了教育领域的这些进步,研究了下游定向干预和教学支持的早期学生绩效预测任务。我们对23个MOOC的实验性分析与数以百万计的六种行为层面的混合互动表明,以我们的方法设计的模型可以(一) 击打最先进的教育时间序列基线,而没有特征提取,(二) 为个人化干预提供可解释的洞察力。源码: http://github.com/epfl-ml4ed/ripplepple/ripple/)。