The management of hyperglycemia in hospitalized patients has a significant impact on both morbidity and mortality. This study used a large clinical database to predict the need for diabetic patients to be hospitalized, which could lead to improvements in patient safety. These predictions, however, may be vulnerable to health disparities caused by social determinants such as race, age, and gender. These biases must be removed early in the data collection process, before they enter the system and are reinforced by model predictions, resulting in biases in the model's decisions. In this paper, we propose a machine learning pipeline capable of making predictions as well as detecting and mitigating biases. This pipeline analyses clinical data, determines whether biases exist, removes them, and then make predictions. We demonstrate the classification accuracy and fairness in model predictions using experiments. The results show that when we mitigate biases early in a model, we get fairer predictions. We also find that as we get better fairness, we sacrifice a certain level of accuracy, which is also validated in the previous studies. We invite the research community to contribute to identifying additional factors that contribute to health disparities that can be addressed through this pipeline.
翻译:住院病人的高血糖管理对发病率和死亡率都有重大影响。这项研究利用一个庞大的临床数据库来预测糖尿病病人住院的必要性,这可以改善病人的安全。然而,这些预测可能容易受到种族、年龄和性别等社会决定因素造成的健康差异的影响。在数据收集过程中,这些偏见必须在早期就消除,然后才能进入系统,并通过模型预测加以强化,从而导致模型决定的偏差。在本文中,我们建议建立一个机器学习管道,能够作出预测,并发现和减轻偏见。这个管道分析临床数据,确定是否存在偏见,消除偏见,然后作出预测。我们用实验来显示模型预测的分类准确性和公正性。结果显示,当我们在模型中早期减少偏见时,我们就会得到更公平的预测。我们发现,随着我们变得更加公平,我们牺牲了某种程度的准确性,这一点在以前的研究中也得到了验证。我们请研究界帮助确定能够通过这一管道消除健康差异的其他因素。