The management of hyperglycemia in hospitalized patients has a significant impact on both morbidity and mortality. Therefore, it is important to predict the need for diabetic patients to be hospitalized. However, using standard machine learning approaches to make these predictions may result in health disparities caused by biases in the data related to social determinants (such as race, age, and gender). These biases must be removed early in the data collection process, before they enter the system and are reinforced by model predictions, resulting in biases in the model's decisions. In this paper, we propose a machine learning pipeline capable of making predictions as well as detecting and mitigating biases in the data and model predictions. This pipeline analyses the clinical data and determines whether biases exist in the data, if so, it removes those biases before making predictions. We evaluate the performance of the proposed method on a clinical dataset using accuracy and fairness measures. The findings of the results show that when we mitigate biases early during the data ingestion, we get fairer predictions.
翻译:住院病人的高血糖管理对发病率和死亡率都有重大影响。因此,必须预测是否有必要住院治疗糖尿病患者。然而,使用标准的机器学习方法来作出这些预测可能会导致社会决定因素(如种族、年龄和性别)相关数据偏见造成的健康差异。在数据收集过程中,这些偏见必须及早消除,然后才能进入系统,并辅以模型预测,从而导致模型决定的偏差。在本文中,我们提出一个机器学习管道,能够作出预测,并发现和减轻数据和模型预测中的偏差。这个管道分析临床数据并确定数据中是否存在偏差,如果是这样的话,则在作出预测之前消除这些偏差。我们利用准确性和公平性衡量方法评估临床数据集的拟议方法的绩效。结果显示,当我们在数据摄入早期减少偏差时,我们得到更公平的预测。