Automated radiographic report generation is a challenging cross-domain task that aims to automatically generate accurate and semantic-coherence reports to describe medical images. Despite the recent progress in this field, there are still many challenges at least in the following aspects. First, radiographic images are very similar to each other, and thus it is difficult to capture the fine-grained visual differences using CNN as the visual feature extractor like many existing methods. Further, semantic information has been widely applied to boost the performance of generation tasks (e.g. image captioning), but existing methods often fail to provide effective medical semantic features. Toward solving those problems, in this paper, we propose a memory-augmented sparse attention block utilizing bilinear pooling to capture the higher-order interactions between the input fine-grained image features while producing sparse attention. Moreover, we introduce a novel Medical Concepts Generation Network (MCGN) to predict fine-grained semantic concepts and incorporate them into the report generation process as guidance. Our proposed method shows promising performance on the recently released largest benchmark MIMIC-CXR. It outperforms multiple state-of-the-art methods in image captioning and medical report generation.
翻译:自动放射报告的生成是一项具有挑战性的跨领域任务,其目的是自动生成准确和语义一致的报告,以描述医疗图像。尽管最近在这一领域取得了进展,但仍存在许多挑战,至少在以下几个方面。首先,放射图像彼此非常相似,因此很难利用CNN作为视觉特征提取器像许多现有方法一样来捕捉细微的视觉差异。此外,语义信息被广泛用于促进生成任务(如图像说明)的绩效,但现有方法往往无法提供有效的医学语义特征。为解决这些问题,我们在本文件中提出一个记忆缩略图微小的注意力区,利用双线聚合来捕捉到输入精美化图像特征之间的更高层次的相互作用,同时引起微小的注意。此外,我们引入了一个新的医学概念生成网络,以预测精细的语义概念,并将其纳入报告生成过程中作为指导。我们提出的方法显示,最近发行的最大基准MIIC-CXR的医学模型生成方法有望取得良好的表现。它超越了多州版的模型生成方法。