Electrocardiogram (ECG) datasets tend to be highly imbalanced due to the scarcity of abnormal cases. Additionally, the use of real patients' ECGs is highly regulated due to privacy issues. Therefore, there is always a need for more ECG data, especially for the training of automatic diagnosis machine learning models, which perform better when trained on a balanced dataset. We studied the synthetic ECG generation capability of 5 different models from the generative adversarial network (GAN) family and compared their performances, the focus being only on Normal cardiac cycles. Dynamic Time Warping (DTW), Fr\'echet, and Euclidean distance functions were employed to quantitatively measure performance. Five different methods for evaluating generated beats were proposed and applied. We also proposed 3 new concepts (threshold, accepted beat and productivity rate) and employed them along with the aforementioned methods as a systematic way for comparison between models. The results show that all the tested models can, to an extent, successfully mass-generate acceptable heartbeats with high similarity in morphological features, and potentially all of them can be used to augment imbalanced datasets. However, visual inspections of generated beats favors BiLSTM-DC GAN and WGAN, as they produce statistically more acceptable beats. Also, with regards to productivity rate, the Classic GAN is superior with a 72% productivity rate. We also designed a simple experiment with the state-of-the-art classifier (ECGResNet34) to show empirically that the augmentation of the imbalanced dataset by synthetic ECG signals could improve the performance of classification significantly.
翻译:心电图(ECG)数据集往往由于异常病例稀少而高度失衡。此外,由于隐私问题,实际病人的ECG功能的使用受到高度监管。因此,始终需要更多的ECG数据,特别是培训自动诊断机学习模型,这些模型在经过均衡数据集培训后表现更好。我们研究了与基因对抗网络(GAN)家庭5种不同模型合成ECG生成能力,并比较了这些模型的性能,其重点只是正常心脏周期。动态的时态扭曲(DTW)、Fr\'echet和Euclidean远程功能被用于定量测量性能。提出并应用了5种不同的评价节拍的方法。我们还提出了3个新概念(超值、被接受的节拍和生产率),并使用上述方法作为系统比较模型的方法。结果显示,所有测试的模型可以在一定程度上成功地将可接受性心跳的心跳心跳频率与形态特征高度相似,而且所有这些模型都可能被用来提高ERC值的高级性能。我们还可以用ARC的高级性数据测试比高的GNL值。