Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions.
翻译:合成数据生成是一个很有希望的解决方案,通过分发敏感健康数据来解决隐私问题。最近,传播模型为不同数据模式的基因化模型制定了新的标准。最近,结构化国家空间模型也出现了一个强有力的模型模型,用以在时间序列中捕捉长期依赖性。我们提出SSSD-ECG,作为这两种技术的结合,用于生成合成12级铅电子心电图,其条件是70多个ECG声明。由于缺乏可靠的基线,我们还提出了两种最先进的无条件基因化模型的有条件变体。我们彻底评估了所生成样品的质量,对生成数据的预培训分类器进行了评估,并评价了仅接受过合成数据培训的分类器的性能,而SSSD-ECG显然在合成数据中超越了以GAN为基础的竞争者。我们通过进一步实验,包括有条件的等级间插和临床图解测试,表明SSSD-ECG样本在广泛条件下的质量很高。我们展示了我们的方法的正确性。