评价合成电子健康记录 (Evaluation of the Synthetic Electronic Health Records)

Generative models have been found effective for data synthesis due to their ability to capture complex underlying data distributions. The quality of generated data from these models is commonly evaluated by visual inspection for image datasets or downstream analytical tasks for tabular datasets. These evaluation methods neither measure the implicit data distribution nor consider the data privacy issues, and it remains an open question of how to compare and rank different generative models. Medical data can be sensitive, so it is of great importance to draw privacy concerns of patients while maintaining the data utility of the synthetic dataset. Beyond the utility evaluation, this work outlines two metrics called Similarity and Uniqueness for sample-wise assessment of synthetic datasets. We demonstrate the proposed notions with several state-of-the-art generative models to synthesise Cystic Fibrosis (CF) patients' electronic health records (EHRs), observing that the proposed metrics are suitable for synthetic data evaluation and generative model comparison.

翻译：由于能够捕捉复杂的原始数据分布,生成模型被认为对数据合成十分有效。这些模型产生的数据的质量通常通过图像数据集的直观检查或表格数据集的下游分析任务加以评价。这些评价方法既不衡量隐含的数据分布,也不考虑数据隐私问题,这仍然是如何比较和划分不同基因化模型的未决问题。医学数据可能是敏感的,因此在保持合成数据集的数据效用的同时,吸引病人对隐私的关切非常重要。除了实用性评估外,这项工作还概述了两个指标,即合成数据集抽样评估的相似性和独特性。我们用几种最先进的基因化模型展示了拟议的概念,这些模型用于合成Cystic Fibrois(CF)病人的电子健康记录,指出拟议的指标适合于合成数据评价和基因化模型比较。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【牛津大学】电子医疗记录的生成式对抗网络:应用、评估措施和数据来源综述，A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

专知会员服务

24+阅读 · 2022年3月15日