Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX (secukinumab) Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
翻译:分享临床研究的数据可以促进创新的数据驱动研究,并最终导致更好的公共健康。但是,共享生物医学数据可以使敏感的个人信息面临风险。这通常通过匿名解决,这是一个缓慢而昂贵的过程。匿名的替代办法是共享一个合成数据集,该数据集具有与真实数据相似的行为,但保护隐私。作为诺华大学和牛津大数据研究所合作的一部分,我们根据COSENTYX(secukinumab) Ankylosing Spondylis (AS) 临床研究生成一个合成数据集。我们应用辅助性分类GAN(ac-GAN)来生成脊椎单元的合成磁共振图像(MRIS)。图像以VU的位置(宫颈、胸腔和腰栏)为条件。在本文中,我们提出了一个生成合成数据集的方法,并围绕三个关键指标的特性进行深入分析:图像准确性、样本多样性和数据保密性。