Nowadays, colleges and universities use predictive analytics in a variety of ways to increase student success rates. Despite the potentials for predictive analytics, there exist two major barriers to their adoption in higher education: (a) the lack of democratization in deployment, and (b) the potential to exacerbate inequalities. Education researchers and policymakers encounter numerous challenges in deploying predictive modeling in practice. These challenges present in different steps of modeling including data preparation, model development, and evaluation. Nevertheless, each of these steps can introduce additional bias to the system if not appropriately performed. Most large-scale and nationally representative education data sets suffer from a significant number of incomplete responses from the research participants. Missing Values are the frequent latent causes behind many data analysis challenges. While many education-related studies addressed the challenges of missing data, little is known about the impact of handling missing values on the fairness of predictive outcomes in practice. In this paper, we set out to first assess the disparities in predictive modeling outcome for college-student success, then investigate the impact of imputation techniques on the model performance and fairness using a comprehensive set of common metrics. The comprehensive analysis of a real large-scale education dataset reveals key insights on the modeling disparity and how different imputation techniques fundamentally compare to one another in terms of their impact on the fairness of the student-success predictive outcome.
翻译:目前,大专院校以多种方式使用预测分析方法提高学生成功率。尽管预测分析有潜力,但在高等教育中采用这种分析存在两大障碍:(a) 部署方面缺乏民主化,以及(b) 不平等加剧的可能性。教育研究人员和决策者在实际运用预测模型时遇到许多挑战。在包括数据编制、模型开发和评价在内的不同建模步骤中出现的这些挑战。然而,这些步骤中的每一步骤都可能给系统带来更多的偏差。大多数大型和具有国家代表性的教育数据集都受到研究参与者大量不完整的反馈的影响。缺失值是许多数据分析挑战背后的常见潜在原因。虽然许多与教育有关的研究解决了缺失数据的挑战,但对于处理缺失值对实践中预测结果的公平性的影响却知之甚少。在本文中,我们首先评估了大学成功预测模型结果的预测性差异,然后利用一套综合通用指标来调查建模对模型业绩和公平性的影响。缺乏价值是许多数据分析挑战背后的常见潜在潜在潜在潜在原因。全面分析对学生结果的另一种重大差异。