In this paper, we study knowledge tracing in the domain of programming education and make two important contributions. First, we harvest and publish so far the most comprehensive dataset, namely BePKT, which covers various online behaviors in an OJ system, including programming text problems, knowledge annotations, user-submitted code and system-logged events. Second, we propose a new model PDKT to exploit the enriched context for accurate student behavior prediction. More specifically, we construct a bipartite graph for programming problem embedding, and design an improved pre-training model PLCodeBERT for code embedding, as well as a double-sequence RNN model with exponential decay attention for effective feature fusion. Experimental results on the new dataset BePKT show that our proposed model establishes state-of-the-art performance in programming knowledge tracing. In addition, we verify that our code embedding strategy based on PLCodeBERT is complementary to existing knowledge tracing models to further enhance their accuracy. As a side product, PLCodeBERT also results in better performance in other programming-related tasks such as code clone detection.
翻译:在本文中,我们研究编程教育领域的知识追踪,并做出两项重要贡献。首先,我们收集并公布迄今为止最全面的数据集,即BepKT,该数据集涵盖OJ系统中的各种在线行为,包括编程文本问题、知识说明、用户提交的代码和系统浏览事件。第二,我们提出一个新的模式PDKT,以利用丰富的环境对学生行为作出准确的预测。更具体地说,我们为编程问题嵌入建立一个双方图,并设计一个经过改进的编程前模型PLCodeBERT, 用于编码嵌入的PLCODEBERT, 以及一个具有快速衰减关注功能有效聚合的双序列RNN模型。关于新的数据集BEPKT的实验结果显示,我们提议的模型在编程知识追踪方面确立了最新的业绩。此外,我们核查我们的编程基于PLCODEBERT的编程战略是对现有知识追踪模型的补充,以进一步提高其准确性。作为副产品,PLCODEBERT还提高了其他与编程有关的任务的绩效。