双模态对比学习结合表格和图像数据 (Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data)

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

翻译：医学数据集尤其是生物库通常包含丰富的临床信息的表格数据和图像。在实践中，临床医生通常拥有更少的数据，无论在多样性还是规模上，但仍希望部署深度学习解决方案。随着医学数据集的不断增长和昂贵的注释成本，预训练多模态和单模态预测的无监督方法的必要性已经上升。为了满足这些需求，我们提出了第一个自监督的对比学习框架，利用图像和表格数据训练单模态编码器。我们的解决方案结合了SimCLR和SCARF两种领先的对比学习策略，简单且有效。在实验中，我们通过使用40,000名英国生物库受试者的心脏MR图像和120种临床特征来预测心肌梗死和冠状动脉疾病（CAD）的风险，展示了我们框架的优势。此外，我们展示了我们的方法的泛化能力，使用DVM汽车广告数据集进行自然图像训练。我们利用表格数据的高可解释性，并通过归因和消融实验发现形态表格特征在对比学习过程中具有重要的作用，并提高了学习嵌入的质量。最后，我们通过在多模态预训练期间附加基础事实标签作为表格特征介绍了一种新的监督对比学习方法——标签作为特征（LaaS），优于所有监督对比基线。