MedDiff:利用加速排泄扩散模式生成电子健康记录 (MedDiff: Generating Electronic Health Records using Accelerated Denoising Diffusion Model)

Due to patient privacy protection concerns, machine learning research in healthcare has been undeniably slower and limited than in other application domains. High-quality, realistic, synthetic electronic health records (EHRs) can be leveraged to accelerate methodological developments for research purposes while mitigating privacy concerns associated with data sharing. The current state-of-the-art model for synthetic EHR generation is generative adversarial networks, which are notoriously difficult to train and can suffer from mode collapse. Denoising Diffusion Probabilistic Models, a class of generative models inspired by statistical thermodynamics, have recently been shown to generate high-quality synthetic samples in certain domains. It is unknown whether these can generalize to generation of large-scale, high-dimensional EHRs. In this paper, we present a novel generative model based on diffusion models that is the first successful application on electronic health records. Our model proposes a mechanism to perform class-conditional sampling to preserve label information. We also introduce a new sampling strategy to accelerate the inference speed. We empirically show that our model outperforms existing state-of-the-art synthetic EHR generation methods.

翻译：由于病人的隐私保护问题,保健方面的机器学习研究无疑比其他应用领域慢,而且有限;可以利用高质量、现实、合成电子健康记录(EHRs)来加速研究方法的发展,同时减轻与数据共享有关的隐私关切;目前合成EHR一代的最新先进模型是基因化对抗网络,这种网络在培训上极为困难,并可能因模式崩溃而受害;最近显示,由统计热力学启发的基因化模型类别 -- -- 一种由统计热力学启发的基因化模型 -- -- 在某些领域产生高质量的合成样本;不清楚这些记录能否概括为大规模、高维度的合成健康记录(EHRs)的生成;在本文件中,我们提出了一个基于传播模型的新型基因化模型,这是电子健康记录的首次成功应用;我们的模式提议了一个进行等级条件抽样的机制,以保存标签信息;我们还采用了一种新的取样战略,以加速推断速度;我们从经验上表明,我们的模型比现有的先进合成HR新一代方法要差。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日