Chest X-ray is one of the most popular medical imaging modalities due to its accessibility and effectiveness. However, there is a chronic shortage of well-trained radiologists who can interpret these images and diagnose the patient's condition. Therefore, automated radiology report generation can be a very helpful tool in clinical practice. A typical report generation workflow consists of two main steps: (i) encoding the image into a latent space and (ii) generating the text of the report based on the latent image embedding. Many existing report generation techniques use a standard convolutional neural network (CNN) architecture for image encoding followed by a Transformer-based decoder for medical text generation. In most cases, CNN and the decoder are trained jointly in an end-to-end fashion. In this work, we primarily focus on understanding the relative importance of encoder and decoder components. Towards this end, we analyze four different image encoding approaches: direct, fine-grained, CLIP-based, and Cluster-CLIP-based encodings in conjunction with three different decoders on the large-scale MIMIC-CXR dataset. Among these encoders, the cluster CLIP visual encoder is a novel approach that aims to generate more discriminative and explainable representations. CLIP-based encoders produce comparable results to traditional CNN-based encoders in terms of NLP metrics, while fine-grained encoding outperforms all other encoders both in terms of NLP and clinical accuracy metrics, thereby validating the importance of image encoder to effectively extract semantic information. GitHub repository: https://github.com/mudabek/encoding-cxr-report-gen
翻译:切斯特X光是一种最受欢迎的医学成像模式之一,因为它的可访问性和有效性。然而,长期缺乏训练有素的放射学家,他们可以解释这些图像并诊断病人的状况。因此,自动化放射报告生成可以成为临床实践中非常有用的工具。典型的报告生成工作流程包括两个主要步骤:(一) 将图像编码成一个隐形空间,以及(二) 根据隐形图像嵌入生成报告文本。许多现有的报告生成技术使用标准化的同源神经网络(CNN)结构进行图像编码,然后是基于变压器的解码器用于医学文本生成。在大多数情况下,CNNP和解码器的生成可能是临床实践过程中非常有用的工具。为此,我们主要侧重于理解编码器和解码组件的相对重要性。为此,我们分析了四种不同的图像编码方法:直接的、精细的、基于CLIP的、基于CLIP的、以及基于CRIP的常规编码的图像编码结构,与三个不同的解算器一起,用于大规模IMIC-Cubc-R的解码的解码器生成,同时,在可比较的SilderLdal-Ldealdeal dealal Produal 方法中,这些CLdealdeal 和制成为CLdeal-dealdaldaldaldaldaldaldaldaldaldaldaldaldal 。