Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest in automated ECG interpretation using machine learning, most current studies focus solely on classification or regression tasks and overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images are more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in regions where only paper-printed ECG images are accessible due to past underdevelopment.
翻译:随着机器学习技术的不断发展,自动解释心电图(ECG)引起了越来越多的关注。尽管机器学习在自动ECG解释方面具有广泛的应用,但目前大多数研究仅集中在分类或回归任务上,忽略了临床心血管疾病诊断的一个重要方面:科学家生成的诊断报告。在本文中,我们介绍了一种新颖的ECG解释方法,利用了最近大型语言模型(LLMs)和Vision-Transformers(ViT)模型的突破。我们提出了一种替代方法来自动识别最相似的临床情况,而不是将ECG诊断视为分类或回归任务。此外,我们将ECG处理为编码图像并采用视觉-语言学习范式,以共同学习编码ECG图像和ECG诊断报告之间的视觉-语言对齐。将ECG编码为图像可以得到有效的ECG检索系统,这将在临床应用中非常实用。更重要的是,我们的发现可作为在过去尚未开发地区提供诊断服务的重要资源。