阅读种族:AI承认患者在医学形象方面的种族特征 (Reading Race: AI Recognises Patient's Racial Identity In Medical Images)

Imon Banerjee,Ananth Reddy Bhimireddy,John L. Burns,Leo Anthony Celi,Li-Ching Chen,Ramon Correa,Natalie Dullerud,Marzyeh Ghassemi,Shih-Cheng Huang,Po-Chih Kuo,Matthew P Lungren,Lyle Palmer,Brandon J Price,Saptarshi Purkayastha,Ayis Pyrros,Luke Oakden-Rayner,Chima Okechukwu,Laleh Seyyed-Kalantari,Hari Trivedi,Ryan Wang,Zachary Zaiman,Haoran Zhang,Judy W Gichoya

from arxiv, Submitted to the Lancet

Background: In medical imaging, prior studies have demonstrated disparate AI performance by race, yet there is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images. Methods: Using private and public datasets we evaluate: A) performance quantification of deep learning models to detect race from medical images, including the ability of these models to generalize to external environments and across multiple imaging modalities, B) assessment of possible confounding anatomic and phenotype population features, such as disease distribution and body habitus as predictors of race, and C) investigation into the underlying mechanism by which AI models can recognize race. Findings: Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities. Our findings hold under external validation conditions, as well as when models are optimized to perform clinically motivated tasks. We demonstrate this detection is not due to trivial proxies or imaging-related surrogate covariates for race, such as underlying disease distribution. Finally, we show that performance persists over all anatomical regions and frequency spectrum of the images suggesting that mitigation efforts will be challenging and demand further study. Interpretation: We emphasize that model ability to predict self-reported race is itself not the issue of importance. However, our findings that AI can trivially predict self-reported race -- even from corrupted, cropped, and noised medical images -- in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to.

翻译：在医学成像中,先前的研究显示了不同种族的AI性功能,然而,对于医学成像的种族,人类专家在解读这些成像时可以明显地看到。方法:使用私人和公共的数据集,我们评估:A)从医学成像中检测种族的深层次学习模型的绩效量化,包括这些模型向外部环境和跨多种成像模式推广的能力,B)评估可能混为一谈的解剖和人造型人口特征,如疾病分布和作为种族预测器的人体习惯,C)调查AI模型能够识别种族的基本机制。结果:标准深层次学习模型可以被训练,从医学成像的高度性能中预测种族的竞赛。我们的发现在外部验证条件下,以及当模型能够优化到外部环境,以临床为动力执行任务时,包括这些模型不是微小的,或与成像相关的成象性变变异体,例如基本疾病分布模型。最后,我们表明,在所有解剖区域和图像的频谱中,业绩会持续存在,表明减缓努力不会在多种成型成型模式中产生挑战性,要求自己进行自我预测。我们强调, 种族的自我解读数据可以用来解释:我们是如何解释:我们是如何解释。我们是如何解释。我们是如何解释。我们是如何解释。我们强调种族的。