Contrastive Language--Image Pre-training (CLIP) has shown remarkable success in learning with cross-modal supervision from extensive amounts of image--text pairs collected online. Thus far, the effectiveness of CLIP has been investigated primarily in general-domain multimodal problems. This work evaluates the effectiveness of CLIP for the task of Medical Visual Question Answering (MedVQA). To this end, we present PubMedCLIP, a fine-tuned version of CLIP for the medical domain based on PubMed articles. Our experiments are conducted on two MedVQA benchmark datasets and investigate two MedVQA methods, MEVF (Mixture of Enhanced Visual Features) and QCR (Question answering via Conditional Reasoning). For each of these, we assess the merits of visual representation learning using PubMedCLIP, the original CLIP, and state-of-the-art MAML (Model-Agnostic Meta-Learning) networks pre-trained only on visual data. We open source the code for our MedVQA pipeline and pre-training PubMedCLIP. CLIP and PubMedCLIP achieve improvements in comparison to MAML's visual encoder. PubMedCLIP achieves the best results with gains in the overall accuracy of up to 3%. Individual examples illustrate the strengths of PubMedCLIP in comparison to the previously widely used MAML networks. Visual representation learning with language supervision in PubMedCLIP leads to noticeable improvements for MedVQA. Our experiments reveal distributional differences in the two MedVQA benchmark datasets that have not been imparted in previous work and cause different back-end visual encoders in PubMedCLIP to exhibit different behavior on these datasets. Moreover, we witness fundamental performance differences of VQA in general versus medical domains.
翻译:语言- 语言- 语言- 语言- 语言- 培训前 培训前 (CLIP) 在从网上收集的大量图像- 文本配对中学习跨模式监督方面表现出了显著的成功。 到目前为止, CLIP 的有效性主要在一般多式问题中得到了调查。 这项工作评估了 CLIP 执行医学视觉解答任务( MedVQA) 的有效性。 为此, 我们介绍了 PubMedCLIP, 一个基于 PubMed 文章的医学领域精美化版的 CLIP 。 我们实验了两个 MDVQA 基准数据集, 并调查了两种MDVA 广泛方法, MEVF (增强视觉属性的混合) 和 QCR(通过调控解调的质变问题回答) 。 我们通过 PUCLMA 数据库中的最佳代码, 和 POCMM 预变现的 PUMA 数据库, 在 PUMLMA 数据库中, 和 POCML 数据库中, 在 PUMMA 数据库中, 数据流解算变变化了我们的最佳代码, 在 PUCLBM 数据库中, 在 PUMLMA 数据中, 数据流流流流变变变化了我们最前的解解变的解变。