In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray images, features two distinct modules: (i) Learned knowledge base: To absorb the knowledge embedded in the radiology reports, we build a knowledge base that can automatically distil and restore medical knowledge from textual embedding without manual labour; (ii) Multi-modal alignment: to promote the semantic alignment among reports, disease labels, and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU-Xray and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, with the assistance of both modules, our approach outperforms state-of-the-art methods over almost all the metrics.
翻译:在诊所,放射报告是指导病人治疗的关键。然而,写放射报告是放射学家的沉重负担。为此,我们提出一种自动的、多式的方法,用胸部X光来生成报告。我们采取的方法是,放射报告中的说明与X射线图像的具体信息高度相关,其特点有两个不同的模块:(一) 知识库:为了吸收放射报告中所包含的知识,我们建立了一个知识库,可以自动分解并恢复没有人工劳动的文本嵌入的医疗知识;(二) 多种模式调整:促进报告、疾病标签和图像之间的语义协调,我们明确使用文字嵌入来指导视觉特征空间的学习。我们用天然语言生成的测量仪以及公共IU-Xray和MIMIMIM-CXR数据集的临床效率评估拟议模型的性能。我们的研究显示,每个模块都有助于改进生成的报告的质量。此外,在两个模块的帮助下,我们的方法几乎超越了所有标准方法。