Knowledge tracing (KT) is the task of using students' historical learning interaction data to model their knowledge mastery over time so as to make predictions on their future interaction performance. Recently, remarkable progress has been made of using various deep learning techniques to solve the KT problem. However, the success behind deep learning based knowledge tracing (DLKT) approaches is still left somewhat unknown and proper measurement and analysis of these DLKT approaches remain a challenge. First, data preprocessing procedures in existing works are often private and custom, which limits experimental standardization. Furthermore, existing DLKT studies often differ in terms of the evaluation protocol and are far away real-world educational contexts. To address these problems, we introduce a comprehensive python based benchmark platform, \textsc{pyKT}, to guarantee valid comparisons across DLKT methods via thorough evaluations. The \textsc{pyKT} library consists of a standardized set of integrated data preprocessing procedures on 7 popular datasets across different domains, and 10 frequently compared DLKT model implementations for transparent experiments. Results from our fine-grained and rigorous empirical KT studies yield a set of observations and suggestions for effective DLKT, e.g., wrong evaluation setting may cause label leakage that generally leads to performance inflation; and the improvement of many DLKT approaches is minimal compared to the very first DLKT model proposed by Piech et al. \cite{piech2015deep}. We have open sourced \textsc{pyKT} and our experimental results at https://pykt.org/. We welcome contributions from other research groups and practitioners.
翻译:知识追踪(KT)是使用学生历史学习互动数据来模拟他们的知识掌握程度的任务。最近,在使用各种深层次学习技术解决KT问题方面取得了显著进展。然而,深层次学习基础知识追踪(DLKT)方法的成功仍然有些未知,对这些DLKT方法的适当衡量和分析仍是一个挑战。首先,现有工作中的数据处理程序往往是私自和习惯的,这限制了实验标准化。此外,现有的DLKT研究在评估协议方面往往不同,而且离现实世界教育环境很远。为了解决这些问题,我们引入了全面的基于Python的基准平台,\ textsc{pyKT},以保证通过彻底评估在基于深层次学习的知识追踪(DLKT)方法中进行有效的比较。 \ textsc{pyKT} 图书馆包括一套标准化的关于7个广受欢迎的数据集的综合预处理程序,这限制了实验的标准化标准标准。此外,对于透明实验的DLKT模型实施过程经常比较10个。我们精细和严格的KT实验结果,从微的实验结果, 和严格的KT的实验研究结果,一般的实验结果,通过PILL研究, 导致许多的实验结果,通过DLLLL研究, 的实验结果,从DLLLL研究结果, 将产生。