The subject of "fairness" in artificial intelligence (AI) refers to assessing AI algorithms for potential bias based on demographic characteristics such as race and gender, and the development of algorithms to address this bias. Most applications to date have been in computer vision, although some work in healthcare has started to emerge. The use of deep learning (DL) in cardiac MR segmentation has led to impressive results in recent years, and such techniques are starting to be translated into clinical practice. However, no work has yet investigated the fairness of such models. In this work, we perform such an analysis for racial/gender groups, focusing on the problem of training data imbalance, using a nnU-Net model trained and evaluated on cine short axis cardiac MR data from the UK Biobank dataset, consisting of 5,903 subjects from 6 different racial groups. We find statistically significant differences in Dice performance between different racial groups. To reduce the racial bias, we investigated three strategies: (1) stratified batch sampling, in which batch sampling is stratified to ensure balance between racial groups; (2) fair meta-learning for segmentation, in which a DL classifier is trained to classify race and jointly optimized with the segmentation model; and (3) protected group models, in which a different segmentation model is trained for each racial group. We also compared the results to the scenario where we have a perfectly balanced database. To assess fairness we used the standard deviation (SD) and skewed error ratio (SER) of the average Dice values. Our results demonstrate that the racial bias results from the use of imbalanced training data, and that all proposed bias mitigation strategies improved fairness, with the best SD and SER resulting from the use of protected group models.
翻译:人工智能中的“公平”主题指评估基于种族和性别等人口特征的潜在偏见的AI算法(AI),以及针对这一偏见制定算法。迄今为止,大多数应用都是在计算机视野中,尽管在保健方面已经开始出现一些工作。在心脏MR分割中采用深度学习(DL),近年来取得了令人印象深刻的结果,而且这种技术已开始转化为临床实践。然而,还没有开展任何工作调查此类模型的公平性。在这项工作中,我们为种族/性别群体进行这样的分析,重点是培训数据不平衡的问题,使用对英国Biobank数据集的cine 短轴心MRMM数据进行训练和评价的nnU-Net模型,其中包括6个不同种族群体的5,903个科目。我们发现不同种族群体在Dice表现上的统计上有很大差异,为了减少种族偏差,我们调查了三种战略:(1) 分批抽样,建议分批抽样,以确保所有种族群体之间的平衡;(2) 公平分解数据学习,其中,一个DLSligererriquer 的结果是用来进行精度分析的精度分析,然后进行种族和最佳分流分析,然后将我们进行种族分流分析。(3),我们使用不同的分流分析,我们使用。我们使用的分段段段,我们使用。我们使用的分解,我们使用的分解,我们使用的分解。