多模态对比学习：利用表格和图像数据的最佳结合 (Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data)

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

翻译：医学数据集，尤其是生物库，通常包含大量的表格数据和丰富的临床信息，以及图像。在实践中，临床医生通常拥有相对较少的数据，无论是在多样性和规模方面，但仍希望部署深度学习解决方案。结合医疗数据集大小的不断增加和昂贵的注释成本，需要无监督的预处理多模式数据并单模式预测的方法日益迫切。为了解决这些问题，我们提出了第一个自监督对比学习框架，利用图像和表格数据进行单模编码器的训练。我们的解决方案结合了SimCLR和SCARF两种领先的对比学习策略，简单而有效。在实验中，我们通过使用心脏MR图像和来自40,000个英国生物库受试者的120个临床特征来预测心肌梗死和冠状动脉疾病的风险，展示了我们的框架的强大实力。此外，我们还展示了我们方法对自然图像的泛化能力，使用了DVM汽车广告数据集。我们利用表格数据的高可解释性，并通过归因和消融实验发现，描述大小和形状的形态表格特征在对比学习过程中具有超常的重要性，并提高了学习的嵌入质量。最后，我们介绍了一种新型的监督对比学习方式，即标签转换成特征（LaaF），通过在多模式预处理期间将地面实况标签附加为表格特征，超过了所有监督对比基线。