Image-text contrastive learning has proven effective for pretraining medical image models. When targeting localized downstream tasks like semantic segmentation or object detection, additional local contrastive losses that align image regions with sentences have shown promising results. We study how local contrastive losses are related to global (per-sample) contrastive losses and which effects they have on localized medical downstream tasks. Based on a theoretical comparison, we propose to remove some components of local losses and replace others by a novel distribution prior which enforces uniformity of representations within each sample. We empirically study this approach on chest X-ray tasks and find it to be very effective, outperforming methods without local losses on 12 of 18 tasks.
翻译:图像-文字对比性学习已证明对医学图像模型培训前的学习十分有效。当针对局部下游任务,如语义分离或物体探测时,将图像区域与句子相匹配的额外地方对比性损失显示出令人乐观的结果。我们研究了当地对比性损失与全球(每类)对比性损失的关系,以及它们对本地医疗下游任务的影响。根据理论比较,我们建议删除当地损失的某些部分,代之以新颖的分发方式,在每种样本中实行统一表述方式。我们实验性地研究了胸前X光检查任务中的这一方法,发现它非常有效,在18项任务中的12项没有地方损失的情况下,其效果优于当地方法。</s>