Student modeling, the task of inferring a student's learning characteristics through their interactions with coursework, is a fundamental issue in intelligent education. Although the recent attempts from knowledge tracing and cognitive diagnosis propose several promising directions for improving the usability and effectiveness of current models, the existing public datasets are still insufficient to meet the need for these potential solutions due to their ignorance of complete exercising contexts, fine-grained concepts, and cognitive labels. In this paper, we present MoocRadar, a fine-grained, multi-aspect knowledge repository consisting of 2,513 exercise questions, 5,600 knowledge concepts, and over 12 million behavioral records. Specifically, we propose a framework to guarantee a high-quality and comprehensive annotation of fine-grained concepts and cognitive labels. The statistical and experimental results indicate that our dataset provides the basis for the future improvements of existing methods. Moreover, to support the convenient usage for researchers, we release a set of tools for data querying, model adaption, and even the extension of our repository, which are now available at https://github.com/THU-KEG/MOOC-Radar.
翻译:学生建模,即通过学生与课程交互来推断学生学习特征的任务,是智能教育中的一个基本问题。虽然最近从知识追踪和认知诊断提出了几个有前途的改进方向,以提高当前模型的可用性和有效性,但现有的公共数据集仍然不足以满足这些潜在解决方案的需求,因为它们忽略了完整的练习上下文,精细的概念和认知标签。在本文中,我们提出了MoocRadar,这是一个细粒度的、多方面的知识库,由2,513个练习题、5,600个知识概念和超过12百万的行为记录组成。具体而言,我们提出了一个框架来保证精细的概念和认知标签的高质量和全面注释。统计和实验结果表明,我们的数据集为现有方法的未来改进提供了基础。此外,为了支持研究人员方便使用,我们发布了一组工具,用于数据查询、模型适应甚至扩展我们的仓库,这些工具现在可在https://github.com/THU-KEG/MOOC-Radar上使用。