In clinics, a radiology report is crucial for guiding a patient's treatment. Unfortunately, report writing imposes a heavy burden on radiologists. To effectively reduce such a burden, we hereby present an automatic, multi-modal approach for report generation from chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with the x-ray images, features two distinct modules: (i) Learned knowledge base. To absorb the knowledge embedded in the above-mentioned correlation, we automatically build a knowledge base based on textual embedding. (ii) Multi-modal alignment. To promote the semantic alignment among reports, disease labels and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, with the aid of both modules, our approach clearly outperforms state-of-the-art methods.
翻译:在诊所,放射学报告对于指导病人治疗至关重要。 不幸的是,撰写报告给放射学家带来了沉重的负担。为了有效减轻这种负担,我们在此提出一种自动的、多式的方法,用于从胸前X光中生成报告。我们的方法基于这样一种观察,即放射学报告中的描述与X射线图像高度相关,有两个不同的模块:(一) 学习知识库。为了吸收上述关联中所包含的知识,我们自动建立基于文本嵌入的知识库。 (二) 多式调整。为了促进报告、疾病标签和图像之间的语义一致,我们明确使用文字嵌入来指导视觉特征空间的学习。我们用自然语言生成和临床效率指标来评估拟议模型的性能,包括IU和MIMIC-CXR公共数据集。我们的研究显示,每个模块都有助于提高生成报告的质量。此外,在两个模块的帮助下,我们的方法显然超越了最新的方法。