冻结语言模型帮助心电图零样本学习 (Frozen Language Model Helps ECG Zero-Shot Learning)

The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECG. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose Multimodal ECG-Text Self-supervised pre-training (METS), the first work to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECG and automatically machine-generated clinical reports separately. The SSL aims to maximize the similarity between paired ECG and auto-generated report while minimize the similarity between ECG and other reports. In downstream classification tasks, METS achieves around 10% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECG compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability, effectiveness, and efficiency.

翻译：心电图（ECG）是一种便利的非侵入式医学监测工具，用于协助临床诊断心脏疾病。最近，深度学习（DL）技术，尤其是自监督学习（SSL），已经展示出在ECG分类方面具有潜在的应用。SSL预训练在微调后仅使用少量批注数据即可实现竞争力表现。但是，当前的SSL方法依赖于批注数据的可用性，并且无法预测在微调数据集中不存在的标签。为了解决这一挑战，我们提出了多模态ECG-文本自监督预训练（METS），这是第一次利用自动生成的临床报告来指导ECG SSL预训练。我们使用可训练的ECG编码器和冻结的语言模型将匹配的ECG和自动生成的临床报告分别嵌入。SSL旨在最大化配对ECG和自动生成报告之间的相似性，同时最小化ECG和其他报告之间的相似性。在下游分类任务中，METS通过零样本分类实现了大约10％的表现提升，而其他依赖批注数据的监督和SSL基线则没有使用任何批注数据。此外，尽管MIT-BIH包含不同的ECG类别，但METS在MIT-BIH数据集上实现了最高的召回率和F1分数。大量的实验证明了使用ECG-文本多模态自监督学习在通用性，有效性和效率方面的优点。