This article develops a Bayesian hierarchical framework to analyze academic performance in the 2022 second semester Saber 11 examination in Colombia. Our approach combines multilevel regression with municipal and departmental spatial random effects, and it incorporates Ridge and Lasso regularization priors to compare the contribution of sociodemographic covariates. Inference is implemented in a fully open source workflow using Markov chain Monte Carlo methods, and model behavior is assessed through synthetic data that mirror key features of the observed data. Simulation results indicate that Ridge provides the most balanced performance in parameter recovery, predictive accuracy, and sampling efficiency, while Lasso shows weaker fit and posterior stability, with gains in predictive accuracy under stronger multicollinearity. In the application, posterior rankings show a strong centralization of performance, with higher scores in central departments and lower scores in peripheral territories, and the strongest correlates of scores are student level living conditions, maternal education, access to educational resources, gender, and ethnic background, while spatial random effects capture residual regional disparities. A hybrid Bayesian segmentation based on K means propagates posterior uncertainty into clustering at departmental, municipal, and spatial scales, revealing multiscale territorial patterns consistent with structural inequalities and informing territorial targeting in education policy.
翻译:本文构建了一个贝叶斯分层框架来分析2022年第二学期哥伦比亚Saber 11考试中的学业表现。我们的方法将多层次回归与市镇和省级空间随机效应相结合,并通过纳入岭回归和Lasso正则化先验来比较社会人口学协变量的贡献。推断过程采用完全开源的工作流程,利用马尔可夫链蒙特卡洛方法实现,并通过模拟反映观测数据关键特征的合成数据来评估模型行为。模拟结果表明,在参数恢复、预测准确性和抽样效率方面,岭回归提供了最均衡的性能,而Lasso则表现出较弱的拟合度和后验稳定性,但在更强多重共线性下其预测准确性有所提升。在实际应用中,后验排名显示成绩呈现高度集中化趋势,中部省份得分较高而边缘地区得分较低;成绩的最强相关因素包括学生层面的生活条件、母亲教育水平、教育资源获取途径、性别和种族背景,而空间随机效应则捕捉了残余的区域差异。基于K均值的混合贝叶斯分割将后验不确定性传递至省级、市镇级和空间尺度的聚类分析中,揭示了与结构性不平等相一致的多尺度地域模式,并为教育政策中的地域目标制定提供了依据。