Sensitive medical data is often subject to strict usage constraints. In this paper, we trained a generative adversarial network (GAN) on real-world electronic health records (EHR). It was then used to create a data-set of "fake" patients through synthetic data generation (SDG) to circumvent usage constraints. This real-world data was tabular, binary, intensive care unit (ICU) patient diagnosis data. The entire data-set was split into separate data silos to mimic real-world scenarios where multiple ICU units across different hospitals may have similarly structured data-sets within their own organisations but do not have access to each other's data-sets. We implemented federated learning (FL) to train separate GANs locally at each organisation, using their unique data silo and then combining the GANs into a single central GAN, without any siloed data ever being exposed. This global, central GAN was then used to generate the synthetic patients data-set. We performed an evaluation of these synthetic patients with statistical measures and through a structured review by a group of medical professionals. It was shown that there was no significant reduction in the quality of the synthetic EHR when we moved between training a single central model and training on separate data silos with individual models before combining them into a central model. This was true for both the statistical evaluation (Root Mean Square Error (RMSE) of 0.0154 for single-source vs. RMSE of 0.0169 for dual-source federated) and also for the medical professionals' evaluation (no quality difference between EHR generated from a single source and EHR generated from multiple sources).
翻译:敏感医疗数据往往受到严格的使用限制。 在本文中,我们用真实世界电子健康记录(EHR)培训了一个基因对抗网络(GAN),然后通过合成数据生成(SDG)来创建一套“假”病人的数据集,以避免使用限制。这个真实世界数据是表格、二进制、强化护理单位(ICU)病人诊断数据。整个数据集被分割成一个单独的数据库,以模拟真实世界情景,即不同医院的多个伊斯兰法院单位在它们自己的组织内可能拥有类似的结构化数据集,但无法查阅对方的数据集。我们用FL(FL)在每一个组织内,通过合成数据生成(SDGDG),对“假”病人进行“假”的数据集。这个数据集集集是表格、二进制、强化护理单位(ICU)的诊断数据。这个全球、中央GAN用来生成合成病人数据集。我们对这些合成病人进行了一项评估,从一个医疗专业人员小组对两个机构内部结构化的数据集进行了评估,但无法查阅对方的数据集组的数据集。 在EO(HR(HR)的双轨中,我们用单一的中央数据模型和中央数据模型之间,这一模型之间没有将一个独立的单一数据序列数据质量进行独立的合并。