Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed shown significant biases in DL models across demographic subgroups (e.g., race, sex, age) in the context of medical image analysis, indicating a lack of fairness in the models. Although several methods have been proposed in the ML literature to mitigate a lack of fairness in DL models, they focus entirely on the absolute performance between groups without considering their effect on uncertainty estimation. In this work, we present the first exploration of the effect of popular fairness models on overcoming biases across subgroups in medical image analysis in terms of bottom-line performance, and their effects on uncertainty quantification. We perform extensive experiments on three different clinically relevant tasks: (i) skin lesion classification, (ii) brain tumour segmentation, and (iii) Alzheimer's disease clinical score regression. Our results indicate that popular ML methods, such as data-balancing and distributionally robust optimization, succeed in mitigating fairness issues in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. This tradeoff must be mitigated if fairness models are to be adopted in medical image analysis.
翻译:尽管深层次学习(DL)模式在许多医学图像分析任务中表现出了巨大的成功,但将由此形成的模型应用于真正的临床环境需要:(1) 这些模型在不同的亚群体中表现出稳健和公平,(2) 对DL模型预测的信心以不确定的形式准确表达,不幸的是,最近的研究确实表明,在医疗图像分析方面,DL模型在人口分组(例如种族、性别、年龄)之间存在重大偏差,表明这些模型缺乏公平性。虽然在ML文献中提出了几种方法,以缓解DL模型缺乏公平性的情况,但它们完全侧重于各群体之间的绝对业绩,而没有考虑到其对不确定性估计的影响。在这项工作中,我们首先探讨了流行的公平模型对克服各分组在医疗图像分析中在底线性表现方面的偏差的影响,及其对不确定性量化的影响。我们在三种不同的临床相关任务上进行了广泛的实验:(一) 模型性病变分类,(二) 脑癌相关分类,以及(三) 阿尔茨海默氏病临床评分数回归。我们的研究结果表明,流行的ML模型对于克服各分组之间医学图像分析的偏差性分析方法的影响,例如数据平整的稳定性分配。</s>