The availability of large-scale chest X-ray datasets is a requirement for developing well-performing deep learning-based algorithms in thoracic abnormality detection and classification. However, biometric identifiers in chest radiographs hinder the public sharing of such data for research purposes due to the risk of patient re-identification. To counteract this issue, synthetic data generation offers a solution for anonymizing medical images. This work employs a latent diffusion model to synthesize an anonymous chest X-ray dataset of high-quality class-conditional images. We propose a privacy-enhancing sampling strategy to ensure the non-transference of biometric information during the image generation process. The quality of the generated images and the feasibility of serving as exclusive training data are evaluated on a thoracic abnormality classification task. Compared to a real classifier, we achieve competitive results with a performance gap of only 3.5% in the area under the receiver operating characteristic curve.
翻译:大规模胸前X射线数据集的可用性是发展在胸腔异常检测和分类方面良好、基于深层次学习的算法的必要条件。然而,胸部射线中的生物鉴别特征由于病人重新识别的风险,妨碍了公众为研究目的分享这些数据。为了解决这个问题,合成数据生成为医疗图像匿名提供了解决办法。这项工作采用了一种潜在的传播模式,以合成一个匿名胸腔X射线数据集,其中含有高质量的等级定性图像。我们提出了一个加强隐私的取样战略,以确保在图像生成过程中生物鉴别信息不会转移。对生成图像的质量和作为专属培训数据的可行性进行了评估,以进行胸腔异常分类任务。与真正的分类者相比,我们取得了竞争性的结果,在接收器运行特征曲线的区域内,其性能差距仅为3.5%。