Knowledge tracing (KT) is the task of using students' historical learning interaction data to model their knowledge mastery over time so as to make predictions on their future interaction performance. Recently, remarkable progress has been made of using various deep learning techniques to solve the KT problem. However, the success behind deep learning based knowledge tracing (DLKT) approaches is still left somewhat mysterious and proper measurement and analysis of these DLKT approaches remain a challenge. First, data preprocessing procedures in existing works are often private and/or custom, which limits experimental standardization. Furthermore, existing DLKT studies often differ in terms of the evaluation protocol and are far away real-world educational contexts. To address these problems, we introduce a comprehensive python based benchmark platform, \textsc{pyKT}, to guarantee valid comparisons across DLKT methods via thorough evaluations. The \textsc{pyKT} library consists of a standardized set of integrated data preprocessing procedures on 7 popular datasets across different domains, and 10 frequently compared DLKT model implementations for transparent experiments. Results from our fine-grained and rigorous empirical KT studies yield a set of observations and suggestions for effective DLKT, e.g., wrong evaluation setting may cause label leakage that generally leads to performance inflation; and the improvement of many DLKT approaches is minimal compared to the very first DLKT model proposed by Piech et al. \cite{piech2015deep}. We have open sourced \textsc{pyKT} and our experimental results at \url{https://pykt.org/}. We welcome contributions from other research groups and practitioners.
翻译:知识追踪( KT) 是使用学生历史学习互动数据来模拟其知识掌握程度的任务。 最近, 在使用各种深层次学习技术解决 KT 问题方面取得了显著进展。 但是, 深层次学习基础知识追踪( DLKT) 方法的成功仍然留下一些神秘和适当的测量和分析这些 DLKT 方法的挑战。 首先, 现有工作中的数据处理程序往往是私自和/ 或习惯,这限制了实验性标准化。 此外, 现有的 DLKT 研究在评估协议方面往往不同,而且离现实世界教育环境很远。 为了解决这些问题,我们引入了基于基准平台的综合性 Python,\ textsc{ pyKT}, 以保证通过彻底评估对 DLT 方法进行有效的比较。 ktextc{ pyKT} 图书馆包括一套标准化的数据预处理程序, 用于不同领域的7个流行的数据集, 以及10个与用于透明实验的 DLKT 模型执行进行比较。 我们的精密和严格的实验研究结果, 用于微重的KLL 和精确的实验结果, 以及一般的实验结果 。 的实验结果, 用于对 DLLL 的实验室的实验室的实验室的实验, 。