Medical report generation is a challenging task since it is time-consuming and requires expertise from experienced radiologists. The goal of medical report generation is to accurately capture and describe the image findings. Previous works pretrain their visual encoding neural networks with large datasets in different domains, which cannot learn general visual representation in the specific medical domain. In this work, we propose a medical report generation framework that uses a contrastive learning approach to pretrain the visual encoder and requires no additional meta information. In addition, we adopt lung segmentation as an augmentation method in the contrastive learning framework. This segmentation guides the network to focus on encoding the visual feature within the lung region. Experimental results show that the proposed framework improves the performance and the quality of the generated medical reports both quantitatively and qualitatively.
翻译:医学报告的生成是一项具有挑战性的任务,因为它既耗时又需要有经验的放射学家的专门知识。医学报告的生成目标是准确捕获和描述图像结果。以前的工作预示着其视觉编码神经网络,在不同领域拥有庞大的数据集,无法学习特定医学领域的一般直观表现。在这项工作中,我们提议一个医学报告生成框架,采用对比性学习方法来预演视觉编码器,不需要额外的元信息。此外,我们采用肺分解作为对比性学习框架中的增强方法。这种分解引导网络侧重于将肺部区域的视觉特征编码。实验结果显示,拟议框架在数量和质量上改进了生成的医疗报告的性能和质量。