Deep neural networks have brought remarkable breakthroughs in medical image analysis. However, due to their data-hungry nature, the modest dataset sizes in medical imaging projects might be hindering their full potential. Generating synthetic data provides a promising alternative, allowing to complement training datasets and conducting medical image research at a larger scale. Diffusion models recently have caught the attention of the computer vision community by producing photorealistic synthetic images. In this study, we explore using Latent Diffusion Models to generate synthetic images from high-resolution 3D brain images. We used T1w MRI images from the UK Biobank dataset (N=31,740) to train our models to learn about the probabilistic distribution of brain images, conditioned on covariables, such as age, sex, and brain structure volumes. We found that our models created realistic data, and we could use the conditioning variables to control the data generation effectively. Besides that, we created a synthetic dataset with 100,000 brain images and made it openly available to the scientific community.
翻译:深神经网络在医学图像分析方面带来了显著的突破,然而,由于它们的数据饥饿性质,医学成像项目中有限的数据集规模可能阻碍它们的全部潜力。生成合成数据提供了一个很有希望的替代方法,可以补充培训数据集和进行更大规模的医学图像研究。传播模型最近通过制作摄影现实合成图像引起计算机视觉界的注意。在这项研究中,我们探索利用冷冻扩散模型从高分辨率的3D大脑图像中生成合成图像。我们利用英国生物银行数据集(N=31,740)的T1w MRI图像来培训我们的模型来学习大脑图像的概率分布,这些图像以年龄、性别和大脑结构积量等可变数据为条件。我们发现我们的模型创造了现实的数据,我们可以使用调节变量来有效控制数据生成。此外,我们创建了一个合成数据集,拥有10万个大脑图像,并向科学界公开提供。