Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data source. While Generative Adversarial Networks (GANs) have provided innovative approaches for histopathological image analysis, they suffer from limitations such as mode collapse and overfitting in discriminator. Recently, Denoising Diffusion models have demonstrated promising results in computer vision. These models exhibit superior stability during training, better distribution coverage, and produce high-quality diverse images. Additionally, they display a high degree of resilience to noise and perturbations, making them well-suited for use in digital pathology, where images commonly contain artifacts and exhibit significant variations in staining. In this paper, we present a novel approach, namely ViT-DAE, which integrates vision transformers (ViT) and diffusion autoencoders for high-quality histopathology image synthesis. This marks the first time that ViT has been introduced to diffusion autoencoders in computational pathology, allowing the model to better capture the complex and intricate details of histopathology images. We demonstrate the effectiveness of ViT-DAE on three publicly available datasets. Our approach outperforms recent GAN-based and vanilla DAE methods in generating realistic images.
翻译:生成式人工智能因其能够合成与原始数据源相似的数据而受到了广泛的关注。虽然生成式对抗网络(GAN)为组织病理学图像分析提供了创新的方法,但它们存在一些局限性,如鉴别器在模式塌陷和过度拟合时的性能,这降低了其应用的可靠性。最近,去噪扩散模型在计算机视觉方面展示了颇具前景的结果。这些模型具有出色的稳定性,在训练过程中具有更好的分布覆盖能力,并且可以生成高质量的多样化图像,而且对噪音和扰动具有很高的鲁棒性,因此非常适合在数字病理学中使用。在本文中,我们提出了一种称之为 ViT-DAE 的新方法,它将视觉转换器(ViT)和扩散自动编码器相结合,用于生成高质量的组织病理学图像。这是ViT第一次在计算机辅助病理学中应用扩散自编码器,使模型更能够捕捉组织病理学图像的复杂和精细的细节特征。我们在三个公开数据集上评估了ViT-DAE的有效性。实验结果表明,与最近的基于GAN和vanilla DAE方法相比,我们的方法在生成逼真图像方面具有优越性。