Biases have marked medical history, leading to unequal care affecting marginalised groups. The patterns of missingness in observational data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is too often a forgotten preprocessing step. At best, practitioners guide imputation choice by optimising overall performance, ignoring how this preprocessing can reinforce inequities. Our work questions this choice by studying how imputation affects downstream algorithmic fairness. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns. Then, through simulations and real-world experiments, we demonstrate that the imputation choice influences marginalised group performance and that no imputation strategy consistently reduces disparities. Importantly, our results show that current practices may endanger health equity as similarly performing imputation strategies at the population level can affect marginalised groups in different ways. Finally, we propose recommendations for mitigating inequity stemming from a neglected step of the machine learning pipeline.
翻译:双轨制具有明显的医学历史特征,导致对边缘化群体的护理不平等。观察数据的缺失模式往往反映了这些群体差异,但群体特有缺失的算法公正性影响却不十分清楚。尽管它具有潜在影响,但估算往往是被遗忘的预处理步骤。在最好的情况下,从业者通过优化总体性能来指导估算选择,忽视这一预处理方法如何加剧不公平现象。我们的工作是通过研究估算如何影响下游算法公平性来质疑这一选择。首先,我们对临床存在机制和特定群体特有缺失模式之间的关系提供了结构化的视角。然后,通过模拟和现实世界实验,我们证明估算选择会影响边缘化群体的业绩,而没有估算战略会不断减少差异。重要的是,我们的结果显示,目前的做法可能会危及健康公平,因为同样地在人口层面执行估算战略会以不同的方式影响边缘化群体。最后,我们提出了减轻因机器学习管道的疏忽而导致的不公平现象的建议。