The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECG. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose Multimodal ECG-Text Self-supervised pre-training (METS), the first work to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECG and automatically machine-generated clinical reports separately. The SSL aims to maximize the similarity between paired ECG and auto-generated report while minimize the similarity between ECG and other reports. In downstream classification tasks, METS achieves around 10% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECG compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability, effectiveness, and efficiency.
翻译:心电图(ECG)是一种便利的非侵入式医学监测工具,用于协助临床诊断心脏疾病。最近,深度学习(DL)技术,尤其是自监督学习(SSL),已经展示出在ECG分类方面具有潜在的应用。SSL预训练在微调后仅使用少量批注数据即可实现竞争力表现。但是,当前的SSL方法依赖于批注数据的可用性,并且无法预测在微调数据集中不存在的标签。为了解决这一挑战,我们提出了多模态ECG-文本自监督预训练(METS),这是第一次利用自动生成的临床报告来指导ECG SSL预训练。我们使用可训练的ECG编码器和冻结的语言模型将匹配的ECG和自动生成的临床报告分别嵌入。SSL旨在最大化配对ECG和自动生成报告之间的相似性,同时最小化ECG和其他报告之间的相似性。在下游分类任务中,METS通过零样本分类实现了大约10%的表现提升,而其他依赖批注数据的监督和SSL基线则没有使用任何批注数据。此外,尽管MIT-BIH包含不同的ECG类别,但METS在MIT-BIH数据集上实现了最高的召回率和F1分数。大量的实验证明了使用ECG-文本多模态自监督学习在通用性,有效性和效率方面的优点。