Curricula in long-cycle programmes are usually recorded in institutional databases as linear lists of courses, yet in practice they operate as directed graphs of prerequisite relationships that constrain student progression through complex dependencies. This paper introduces the CAPIRE Curriculum Graph, a structural feature engineering layer embedded within the CAPIRE framework for student attrition prediction in Civil Engineering at Universidad Nacional de Tucuman, Argentina. We formalise the curriculum as a directed acyclic graph, compute course-level centrality metrics to identify bottleneck and backbone courses, and derive nine structural features at the student-semester level that capture how students navigate the prerequisite network over time. These features include backbone completion rate, bottleneck approval ratio, blocked credits due to incomplete prerequisites, and graph distance to graduation. We compare three model configurations - baseline CAPIRE, CAPIRE plus macro-context variables, and CAPIRE plus macro plus structural features - using Random Forest classifiers on 1,343 students across seven cohorts (2015-2021). While macro-context socioeconomic indicators fail to improve upon the baseline, structural curriculum features yield consistent gains in performance, with the best configuration achieving overall Accuracy of 86.66% and F1-score of 88.08% and improving Balanced Accuracy by 0.87 percentage points over a strong baseline. Ablation analysis further shows that all structural features contribute in a synergistic fashion rather than through a single dominant metric. By making curriculum structure an explicit object in the feature layer, this work extends CAPIRE from a multilevel leakage-aware framework to a curriculum-constrained prediction system that bridges network science, educational data mining, and institutional research.
翻译:长周期培养方案中的课程通常在机构数据库中记录为线性列表,但在实践中它们作为先修关系的定向图运行,通过复杂依赖关系约束学生的学业进展。本文介绍了CAPIRE课程图谱,这是一个嵌入在CAPIRE框架中的结构特征工程层,用于预测阿根廷国立图库曼大学土木工程专业的学生流失。我们将课程体系形式化为有向无环图,计算课程层面的中心性指标以识别瓶颈课程与核心课程,并在学生-学期层面推导出九个结构特征,这些特征捕捉了学生随时间在先修关系网络中的路径选择。这些特征包括核心课程完成率、瓶颈课程通过率、因未完成先修课程而受阻的学分,以及至毕业的图距离。我们使用随机森林分类器,在七个入学批次(2015-2021年)的1,343名学生数据上比较了三种模型配置:基线CAPIRE模型、CAPIRE加宏观背景变量模型、以及CAPIRE加宏观背景变量加结构特征模型。虽然宏观社会经济指标未能提升基线模型性能,但课程结构特征带来了持续的性能增益,最佳配置实现了86.66%的总准确率和88.08%的F1分数,并将平衡准确率较强势基线提升了0.87个百分点。消融分析进一步表明,所有结构特征以协同方式共同贡献,而非依赖单一主导指标。通过将课程结构明确纳入特征层,本工作将CAPIRE从一个多层级防泄漏框架扩展为课程约束的预测系统,从而连接了网络科学、教育数据挖掘与院校研究。