The lack of sufficiently large open medical databases is one of the biggest challenges in AI-powered healthcare. Synthetic data created using Generative Adversarial Networks (GANs) appears to be a good solution to mitigate the issues with privacy policies. The other type of cure is decentralized protocol across multiple medical institutions without exchanging local data samples. In this paper, we explored unconditional and conditional GANs in centralized and decentralized settings. The centralized setting imitates studies on large but highly unbalanced skin lesion dataset, while the decentralized one simulates a more realistic hospital scenario with three institutions. We evaluated models' performance in terms of fidelity, diversity, speed of training, and predictive ability of classifiers trained on the generated synthetic data. In addition we provided explainability through exploration of latent space and embeddings projection focused both on global and local explanations. Calculated distance between real images and their projections in the latent space proved the authenticity and generalization of trained GANs, which is one of the main concerns in this type of applications. The open source code for conducted studies is publicly available at \url{https://github.com/aidotse/stylegan2-ada-pytorch}.
翻译:缺乏足够大、开放的医疗数据库是AI-电力保健的最大挑战之一。使用基因反转网络(GANs)创建的合成数据似乎是缓解隐私政策问题的一个良好解决办法。另一种疗法是将协议分散到多个医疗机构,而不交换当地数据样本。在本文中,我们探索了中央和分散环境中的无条件和有条件的GAN。中央环境仿照了大型但高度不平衡的皮肤损伤数据集的研究,而分散化的数据则与三个机构模拟了更现实的医院假想。我们从忠诚、多样性、培训速度和预测能力的角度评价了在所生成合成数据方面受过培训的分类者的模型的性能。此外,我们还通过探索潜在空间和嵌入投投提供了解释性,同时侧重于全球和地方的解释。计算真实图像与其在潜在空间的预测之间的距离证明了经过培训的GANs的真伪和一般化,这是这类应用中的主要问题之一。进行研究的公开源代码在\url{https://github.com/aidotse-stygan2-styada-stystrat-someaty-stage-somea).