A growing body of work uses the paradigm of algorithmic fairness to frame the development of techniques to anticipate and proactively mitigate the introduction or exacerbation of health inequities that may follow from the use of model-guided decision-making. We evaluate the interplay between measures of model performance, fairness, and the expected utility of decision-making to offer practical recommendations for the operationalization of algorithmic fairness principles for the development and evaluation of predictive models in healthcare. We conduct an empirical case-study via development of models to estimate the ten-year risk of atherosclerotic cardiovascular disease to inform statin initiation in accordance with clinical practice guidelines. We demonstrate that approaches that incorporate fairness considerations into the model training objective typically do not improve model performance or confer greater net benefit for any of the studied patient populations compared to the use of standard learning paradigms followed by threshold selection concordant with patient preferences, evidence of intervention effectiveness, and model calibration. These results hold when the measured outcomes are not subject to differential measurement error across patient populations and threshold selection is unconstrained, regardless of whether differences in model performance metrics, such as in true and false positive error rates, are present. In closing, we argue for focusing model development efforts on developing calibrated models that predict outcomes well for all patient populations while emphasizing that such efforts are complementary to transparent reporting, participatory design, and reasoning about the impact of model-informed interventions in context.
翻译:我们评估了模型性能、公平性以及预期决策效用之间的相互作用,以提供切实可行的建议,使算法性公平原则应用于制定和评价保健预测性模型;我们通过开发模型进行经验案例研究,以估计绝血性心血管疾病10年的风险,为根据临床实践准则启动统计系统提供信息;我们证明,将公平性考虑纳入示范培训目标的做法通常不会改善模型性能,或给任何受研究的病人群体带来更大的净利益,而与使用标准学习模式相比,我们评价了模型性能、公平性和预期作用之间的相互作用,以提供实用性建议,使算法性公平原则应用于制定和评价保健预测性模型;我们通过开发模型进行经验性案例研究,以估计不同病人群体之间10年的测量误差和临界值选择,以通报在根据临床实践准则启动统计标准时可能出现的差异;我们证明,将公平性考虑纳入示范性培训目标的做法通常不会改善示范性业绩,或给任何受研究的病人群体带来更大的净利益,而采用标准性学习模式的范例则与病人的偏好、干预效果的证据,这些结果在衡量结果时不受限制,无论示范性指标性指标的差别,例如真实和假正反误差率,在模式下是否存在差异率的情况下,我们主张所有努力都强调以评价了参与性推模模模模重的进度,同时,同时,我们还主张以评价了参与性推模范式推模模模重于制的推模范式的推论。