Cognitive diagnosis models (CDMs) are a popular tool for assessing students' mastery of sets of skills. Given a set of $K$ skills tested on an assessment, students are classified into one of $2^K$ latent skill set profiles that represent whether they have mastered each skill or not. Traditional approaches to estimating these profiles are computationally intensive and become infeasible on large datasets. Instead, proxy skill estimates can be generated from the observed responses and then clustered, and these clusters can be assigned to different profiles. Building on previous work, we consider how to optimally perform this clustering when not all $2^K$ profiles are possible, e.g. because of hierarchical relationships among the skills, and when not all possible profiles are present in the population. We compare hierarchical clustering and several k-means variants, including semisupervised clustering using simulated student responses. The empty k-means algorithm paired with a novel method for generating starting centers yields the best overall performance.
翻译:认知诊断模型(CDM)是评估学生掌握成套技能的流行工具。根据在评估中测试的一套K$技能,学生被分为代表他们是否掌握了每套技能的2K$潜在技能组合中的一种,这些组合表明他们是否掌握了每套技能。估算这些组合的传统方法在计算上是密集的,在大型数据集中变得不可行。相反,代用技能估计可以从观察到的反应中得出,然后分组,这些组合可以分配给不同的组合。根据以往的工作,我们考虑如何在并非所有2K$组合都可能时以最佳方式进行这一组合,例如,由于技能之间的等级关系,以及如果在人口中没有所有可能的组合。我们比较了等级组合和若干K手段变量,包括使用模拟学生反应的半超集。空 k-point 算法与创建启动中心的新方法配对,产生了最佳的整体性表现。