Electronic Health Records (EHRs) are a valuable asset to facilitate clinical research and point of care applications; however, many challenges such as data privacy concerns impede its optimal utilization. Deep generative models, particularly, Generative Adversarial Networks (GANs) show great promise in generating synthetic EHR data by learning underlying data distributions while achieving excellent performance and addressing these challenges. This work aims to review the major developments in various applications of GANs for EHRs and provides an overview of the proposed methodologies. For this purpose, we combine perspectives from healthcare applications and machine learning techniques in terms of source datasets and the fidelity and privacy evaluation of the generated synthetic datasets. We also compile a list of the metrics and datasets used by the reviewed works, which can be utilized as benchmarks for future research in the field. We conclude by discussing challenges in GANs for EHRs development and proposing recommended practices. We hope that this work motivates novel research development directions in the intersection of healthcare and machine learning.
翻译:电子健康记录(EHRs)是促进临床研究和护理应用点的宝贵资产;然而,许多挑战,如数据隐私问题等,阻碍其得到最佳利用; 深基因模型,特别是基因反转网络(GANs),显示通过学习基本数据分布生成合成的EHR数据的巨大前景,同时取得优异的绩效并应对这些挑战; 这项工作旨在审查环境健康记录(EHRs)各种应用中的主要发展动态,并概述拟议方法; 为此,我们从源数据集和对生成的合成数据集的忠诚和隐私评价的角度,将保健应用和机器学习技术的观点结合起来; 我们还汇编了被审查的作品所使用的指标和数据集清单,可作为今后实地研究的基准; 我们最后通过讨论环境健康信息和联网对环境健康发展的挑战并提出建议做法; 我们希望,这项工作将推动在保健和机器学习的交叉方面形成新的研究发展方向。