双模态对比学习：融合表格数据与图像数据的最佳实践 (Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data)

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

翻译：医学数据集（特别是生物库）通常包含具有丰富临床信息的大量表格数据以及图像数据。在实践中，临床医师通常拥有较少的数据，无论是在多样性还是在规模上，但仍希望部署深度学习解决方案。结合越来越大的医学数据集大小和昂贵的注释成本，需要无监督方法来预先训练多模态并预测单模态。为了满足这些需求，我们提出了第一个自我监督的对比学习框架，利用图像和表格数据训练单模态编码器。我们的解决方案结合了SimCLR和SCARF两种领先的对比学习策略，简单而有效。在我们的实验中，我们通过使用40,000个英国生物银行的心脏MR图像和120个临床特征来预测心肌梗塞和冠状动脉疾病（CAD）的风险，展示了我们框架的强大之处。此外，我们还展示了我们的方法在使用DVM汽车广告数据集进行自然图像的泛化时的可行性。我们利用表格数据的高可解释性，并通过归因和消融实验发现，形态表格特征（描述大小和形状）在对比学习过程中具有超越重要性，并提高了所学嵌入的质量。最后，我们引入了一种新的受监督对比学习形式：将地面真实标签作为表格特征附加在多模态预训练中，称之为“标签作为特征（LaaF）”，优于所有受监督对比基线。