Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large-scale general domain data. Specifically, researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it with chest X-ray images and paired reports to perform zero-shot pathology classification, thus completely removing the need for pathology-annotated images to train a classification model. Existing studies, however, fine-tuned the pre-trained model with the same contrastive learning objective, and failed to exploit the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy based on sentence sampling and positive-pair loss relaxation for improving the downstream zero-shot pathology classification performance, which can be applied to any pre-trained contrastive image-text encoders. Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets and 3 different pre-trained models (5.77% average AUROC increase). In particular, fine-tuning CLIP with our method showed much comparable or marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1 score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent diseases from the CheXpert dataset.
翻译:深海神经网络被成功地推广到不同领域,包括基于医疗图像的病理学分类;然而,在医学领域,用于培训强大神经网络的大规模和高质量数据很少见,因为标签必须由合格的专家做。研究人员最近利用在大规模一般域数据上预先培训的模型,成功地解决这个问题。具体地说,研究人员采用了对比性图像-文本编码器(例如CLIP),并用胸部X射线图像和配对报告对其进行微调,以进行零射线病理分类,从而完全消除对病理学附加说明的图像的需求,以训练一个分类模型。然而,现有的研究,以相同的对比性学习目标对预先训练的模式进行微调,未能利用医学图像-报告配对的多标签性质。在本论文中,我们提出了一个新的微调战略,根据感官抽样和正差损失来改进下游的零射线病理学分类性表现,这可以适用于任何经过训练的5度对比性图像分类前5度图解的分类,从而完全消除了对一个分类模型的需要。我们的方法始终以同样的对比性学习目标为相同的模型,在X值前的10分路路运中以大幅改进了数据。