Medical datasets often face the problem of data scarcity, as ground truth labels must be generated by medical professionals. One mitigation strategy is to pretrain deep learning models on large, unlabelled datasets with self-supervised learning (SSL). Data augmentations are essential for improving the generalizability of SSL-trained models, but they are typically handcrafted and tuned manually. We use an adversarial model to generate masks as augmentations for 12-lead electrocardiogram (ECG) data, where masks learn to occlude diagnostically-relevant regions of the ECGs. Compared to random augmentations, adversarial masking reaches better accuracy when transferring to to two diverse downstream objectives: arrhythmia classification and gender classification. Compared to a state-of-art ECG augmentation method 3KG, adversarial masking performs better in data-scarce regimes, demonstrating the generalizability of our model.
翻译:医疗数据集往往面临数据稀缺问题,因为地面真实标签必须由医疗专业人员制作。一个缓解战略是预先对具有自我监督学习(SSL)的大型、无标签数据集进行深层学习模型的预设。数据增强对于提高SSL培训模型的通用性至关重要,但通常都是手工制作和手工调整的。我们使用对抗模型生成口罩,作为12个铅电心图(ECG)数据的增强功能,在12个铅电心图(ECG)数据中,面罩学习与诊断相关的ECG区域。与随机扩增相比,对立面遮罩在向两个不同的下游目标转移时具有更高的准确性:失常分类和性别分类。与最先进的ECG增强方法3KG相比,对立面遮罩在数据侵蚀系统中表现更好,展示了我们模型的普遍性。