Lung nodule malignancy prediction has been enhanced by advanced deep-learning techniques and effective tricks. Nevertheless, current methods are mainly trained with cross-entropy loss using one-hot categorical labels, which results in difficulty in distinguishing those nodules with closer progression labels. Interestingly, we observe that clinical text information annotated by radiologists provides us with discriminative knowledge to identify challenging samples. Drawing on the capability of the contrastive language-image pre-training (CLIP) model to learn generalized visual representations from text annotations, in this paper, we propose CLIP-Lung, a textual knowledge-guided framework for lung nodule malignancy prediction. First, CLIP-Lung introduces both class and attribute annotations into the training of the lung nodule classifier without any additional overheads in inference. Second, we designed a channel-wise conditional prompt (CCP) module to establish consistent relationships between learnable context prompts and specific feature maps. Third, we align image features with both class and attribute features via contrastive learning, rectifying false positives and false negatives in latent space. The experimental results on the benchmark LIDC-IDRI dataset have demonstrated the superiority of CLIP-Lung, both in classification performance and interpretability of attention maps.
翻译:肺结节恶性预测已经通过先进的深度学习技术和有效的技巧得到了提高。然而,目前的方法主要是使用独热分类标签进行交叉熵损失训练的,这使得难以区分那些进展标签较近的结节变得困难。有趣的是,我们观察到由放射科医生注释的临床文本信息为我们提供了鉴别性知识,以识别具有挑战性的样本。借鉴对比语言-图像预训练(CLIP)模型从文本注释中学习广义视觉表示的能力,在本文中,我们提出了 CLIP-Lung,一种基于文本知识引导的肺结节恶性预测框架。首先,CLIP-Lung 将类别和属性注释引入到肺结节分类器的训练中,无需额外的推理开销。其次,我们设计了一个通道条件提示(CCP)模块,以建立可学习的上下文提示和特定特征图之间的一致性关系。第三,我们通过对比学习将图像特征与类别和属性特征进行对齐,从而在潜在空间中矫正假阳性和假阴性。基于 LIDC-IDRI 基准数据集的实验结果证明了 CLIP-Lung 的优越性,在分类性能和注意力图的可解释性方面均表现出色。