This article develops a Bayesian hierarchical framework to analyze academic performance in the 2022 second semester Saber 11 examination in Colombia. Our approach combines multilevel regression with municipal and departmental spatial random effects, and it incorporates Ridge and Lasso regularization priors to compare the contribution of sociodemographic covariates. Inference is implemented in a fully open source workflow using Markov chain Monte Carlo methods, and model behavior is assessed through synthetic data that mirror key features of the observed data. Simulation results indicate that Ridge provides the most balanced performance in parameter recovery, predictive accuracy, and sampling efficiency, while Lasso shows weaker fit and posterior stability, with gains in predictive accuracy under stronger multicollinearity. In the application, posterior rankings show a strong centralization of performance, with higher scores in central departments and lower scores in peripheral territories, and the strongest correlates of scores are student level living conditions, maternal education, access to educational resources, gender, and ethnic background, while spatial random effects capture residual regional disparities. A hybrid Bayesian segmentation based on K means propagates posterior uncertainty into clustering at departmental, municipal, and spatial scales, revealing multiscale territorial patterns consistent with structural inequalities and informing territorial targeting in education policy.
翻译:本文构建了一个贝叶斯分层框架,用于分析2022年第二学期哥伦比亚Saber 11考试中的学业表现。我们的方法将多层级回归与市、省级空间随机效应相结合,并引入岭回归与Lasso正则化先验来比较社会人口学协变量的贡献。推断过程通过马尔可夫链蒙特卡洛方法在完全开源的工作流中实现,并通过模拟关键观测数据特征的合成数据来评估模型行为。模拟结果表明,岭回归在参数恢复、预测精度和抽样效率方面提供了最均衡的性能,而Lasso则表现出较弱的拟合度与后验稳定性,但在更强多重共线性下其预测精度有所提升。在实证应用中,后验排名显示出成绩的高度集中化趋势——中部省份得分较高,边缘地区得分较低;成绩最相关的因素包括学生层面的生活条件、母亲教育水平、教育资源获取途径、性别及民族背景,而空间随机效应则捕捉了残留的区域差异。基于K均值的混合贝叶斯分割将后验不确定性传播至省级、市级及空间尺度的聚类分析中,揭示了与结构性不平等相一致的多尺度地域模式,为教育政策中的地域目标制定提供了依据。