ECG databases are usually highly imbalanced due to the abundance of Normal ECG and scarcity of abnormal cases. As such, deep learning classifiers trained on imbalanced datasets usually perform poorly, especially on minor classes. One solution is to generate realistic synthetic ECG signals using Generative Adversarial Networks (GAN) to augment imbalanced datasets. In this study, we combined conditional GAN with WGAN-GP and developed AC-WGAN-GP in 1D form for the first time to be applied on MIT-BIH Arrhythmia dataset. We investigated the impact of data augmentation on arrhythmia classification. We employed two models for ECG generation: (i) unconditional GAN; Wasserstein GAN with gradient penalty (WGAN-GP) is trained on each class individually; (ii) conditional GAN; one Auxiliary Classifier WGAN-GP (AC-WGAN-GP) model is trained on all classes and then used to generate synthetic beats in all classes. Two scenarios are defined for each case: (a) unscreened; all the generated synthetic beats were used, and (b) screened; only a portion of generated beats are selected and used, based on their Dynamic Time Warping (DTW) distance to a designated template. A state-of-the-art ResNet classifier (EcgResNet34) is trained on each of the augmented datasets and the performance metrics (precision/recall/F1-Score micro- and macro-averaged, confusion matrices, multiclass precision-recall curves) were compared with those of the unaugmented imbalanced case. We also used a simple metric Net Improvement. All the three metrics show consistently that net improvement (total and minor-class), unconditional GAN with raw generated data (not screened) creates the best improvements.
翻译:ECG数据库通常由于正常ECG的丰富和异常案例稀少而高度失衡。因此,在不平衡数据集方面受过训练的深层次学习分类师通常表现不佳,特别是在小类中。一个解决办法是利用General Aversarial Networks(GAN)生成现实的合成ECG信号,用General Aversarial Nets(GAN)来增加不平衡的数据集。在这项研究中,我们首次将有条件的GAN与WGAN-GP(WGAN-GP)组合起来,并开发了1D格式的AC-WGAN-GP(AC-WGAN-GP)模型,用于MIT-BIH Arrythmiam 数据集。我们调查了数据扩增数据对心律分类的影响。我们为ECG生成了两种模型:(一)无条件的GAN;瓦瑟斯坦GAN-GAN(WGAN-GP) 以所有类别为单位,然后用于所有类别中进行合成的改进。两种假设是:(a)未筛选的内压-内压-内压-内流数据,使用所有已制作的内压-内压-内流数据。