改善零样本X射线病理分类：基于预训练图像-文本编码器微调的显着方法 (Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders)

Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large-scale general domain data. Specifically, researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it with chest X-ray images and paired reports to perform zero-shot pathology classification, thus completely removing the need for pathology-annotated images to train a classification model. Existing studies, however, fine-tuned the pre-trained model with the same contrastive learning objective, and failed to exploit the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy based on sentence sampling and positive pair loss relaxation for improving the downstream zero-shot pathology classification performance, which can be applied to any pre-trained contrastive image-text encoders. Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets and 3 different pre-trained models (5.77% average AUROC increase). In particular, fine-tuning CLIP with our method showed much comparable or marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1 score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent diseases from the CheXpert dataset.

翻译：深度神经网络已成功应用于包括基于医学图像的病理分类在内的多个领域。然而，在医学领域训练强大神经网络所需的大规模高质量数据很少，因为标记必须由合格的专家完成。最近的研究者通过利用在大规模通用域数据上预训练的模型，某些程度上成功地解决了这个问题。具体来说，研究者采用了对比图像-文本编码器（如CLIP）并将其微调为胸部X光图像和配对报告来执行零样本病理分类，从而完全消除了使用病理注释图像来训练分类模型的需要。然而，现有研究将预训练模型与相同的对比学习目标微调，并未利用医学图像-报告对的多标签特性。在本文中，我们提出了一种基于句子采样和正对损失放松的新微调策略，用于提高下游零样本病理分类性能，可应用于任何预训练的对比图像-文本编码器。我们的方法在四个不同的胸部X光数据集和3种不同的预训练模型上始终显示出显着改善的零样本病理分类性能（5.77％平均AUROC增加）。特别是，用我们的方法微调CLIP在CheXpert数据集上展现了与董事认证的放射科医师相当或略优的表现（F1得分为0.619对0.625，MCC为0.530对0.544）和五种突出疾病的零样本分类。