An important component of human analysis of medical images and their context is the ability to relate newly seen things to related instances in our memory. In this paper we mimic this ability by using multi-modal retrieval augmentation and apply it to several tasks in chest X-ray analysis. By retrieving similar images and/or radiology reports we expand and regularize the case at hand with additional knowledge, while maintaining factual knowledge consistency. The method consists of two components. First, vision and language modalities are aligned using a pre-trained CLIP model. To enforce that the retrieval focus will be on detailed disease-related content instead of global visual appearance it is fine-tuned using disease class information. Subsequently, we construct a non-parametric retrieval index, which reaches state-of-the-art retrieval levels. We use this index in our downstream tasks to augment image representations through multi-head attention for disease classification and report retrieval. We show that retrieval augmentation gives considerable improvements on these tasks. Our downstream report retrieval even shows to be competitive with dedicated report generation methods, paving the path for this method in medical imaging.
翻译:医学图像及其上下文的人类分析的一个重要部分是能够将新看到的东西与记忆中的相关实例联系起来。 在本文中,我们通过使用多模式检索增强功能来模仿这种能力,并将其应用于胸部X光分析中的几项任务。通过检索类似的图像和/或放射报告,我们通过获取类似的图像和/或放射报告,以额外的知识扩大并规范手头的案件,同时保持事实知识的一致性。方法由两个部分组成。第一,视觉和语言模式使用预先培训的CLIP模式加以调整。为了强制实施检索重点将是详细的与疾病有关的内容,而不是全球视觉外观,它将使用疾病类信息加以精细调整。随后,我们建立了一个非参数检索指数,达到最先进的检索水平。我们在下游任务中使用这一指数,通过多头关注疾病分类和报告检索来增加图像的显示。我们显示,检索增强使这些任务有了相当大的改进。我们下游报告检索甚至显示,与专门的报告生成方法具有竞争力,为医疗成像方法铺平了道路。