Medical images are widely used in clinical practice for diagnosis. Automatically generating interpretable medical reports can reduce radiologists' burden and facilitate timely care. However, most existing approaches to automatic report generation require sufficient labeled data for training. In addition, the learned model can only generate reports for the training classes, lacking the ability to adapt to previously unseen novel diseases. To this end, we propose a lesion guided explainable few weak-shot medical report generation framework that learns correlation between seen and novel classes through visual and semantic feature alignment, aiming to generate medical reports for diseases not observed in training. It integrates a lesion-centric feature extractor and a Transformer-based report generation module. Concretely, the lesion-centric feature extractor detects the abnormal regions and learns correlations between seen and novel classes with multi-view (visual and lexical) embeddings. Then, features of the detected regions and corresponding embeddings are concatenated as multi-view input to the report generation module for explainable report generation, including text descriptions and corresponding abnormal regions detected in the images. We conduct experiments on FFA-IR, a dataset providing explainable annotations, showing that our framework outperforms others on report generation for novel diseases.
翻译:在临床诊断实践中广泛使用医疗图象。自动生成可解释的医疗报告可以减少放射学家的负担,便利及时护理。然而,大多数现有的自动报告生成方法需要足够的标签数据用于培训。此外,学习的模型只能为培训课生成报告,缺乏适应以前不见的新疾病的能力。为此,我们提议了一个可忽略的、可忽略不计的医疗报告生成框架,通过视觉和语义特征校正,了解视觉和语义特征校正等不同类别与新类别之间的关联,目的是为培训中未观察到的疾病生成医疗报告。它包括一个以偏离为中心的特征提取器和一个以变异器为基础的报告生成模块。具体地说,以偏离心特征提取器检测异常区域,并学习以多视(视觉和词汇)嵌入为主的视觉和新颖课程之间的相互关系。然后,所检测到的区域和相应的嵌入的特征被归结为可解释报告生成模块的多视角投入,包括文字描述和在图像中检测到的相应异常区域。我们在FFA-IR上进行实验,一个提供可解释性的报告格式格式,以显示其他疾病的新格式。