Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models for chest radiograph diagnosis regarding accuracy and fairness compared to non-private training. For this, we used a large dataset (N=193,311) of high quality clinical chest radiographs, which were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that the non-private CNNs achieved an average AUROC score of 0.90 +- 0.04 over all labels, whereas the DP CNNs with a privacy budget of epsilon=7.89 resulted in an AUROC of 0.87 +- 0.04, i.e., a mere 2.6% performance decrease compared to non-private training. Furthermore, we found the privacy-preserving training not to amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
翻译:人工智能(AI)模型越来越多地用于医疗领域。然而,由于医疗数据高度敏感,需要特殊预防措施来确保保护。隐私保护金标准是引入差异隐私(DP)模型培训。先前的工作表明,DP对模型准确性和公平性有负面影响,这在医学中是不可接受的,是广泛使用隐私保护技术的主要障碍。在这项工作中,我们评估了对乳房射线模型进行隐私保护培训,以进行与非私人培训相比的准确性和公平性诊断。为此,我们使用了高质量临床乳房放射系统(N=193,311)的大型数据集(N=193,311),这是由有经验的放射学家追溯收集并手工标注的。我们比较了非私人深度革命神经网络(CNN)和隐私保护(DP)模型。我们评估了在接收器-感官精度曲线(AUROC)下衡量的隐私和公平性交易的影响。我们用Pearson(NRR)或统计精度(PER04)的准确性放射(Orality)测试测量了整个预算水平,我们发现,我们没有在Oral-PIRA(O)的性别-deal-de Deal) 数据分析(O)中发现,我们发现,而不是预算平均(OD)数据(Oraldealde)的正常数据,我们发现非预算(O)。</s>