一种用于胸部X光报告生成的疾病感知双阶段框架 (A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation)

Radiology report generation from chest X-rays is an important task in artificial intelligence with the potential to greatly reduce radiologists' workload and shorten patient wait times. Despite recent advances, existing approaches often lack sufficient disease-awareness in visual representations and adequate vision-language alignment to meet the specialized requirements of medical image analysis. As a result, these models usually overlook critical pathological features on chest X-rays and struggle to generate clinically accurate reports. To address these limitations, we propose a novel dual-stage disease-aware framework for chest X-ray report generation. In Stage~1, our model learns Disease-Aware Semantic Tokens (DASTs) corresponding to specific pathology categories through cross-attention mechanisms and multi-label classification, while simultaneously aligning vision and language representations via contrastive learning. In Stage~2, we introduce a Disease-Visual Attention Fusion (DVAF) module to integrate disease-aware representations with visual features, along with a Dual-Modal Similarity Retrieval (DMSR) mechanism that combines visual and disease-specific similarities to retrieve relevant exemplars, providing contextual guidance during report generation. Extensive experiments on benchmark datasets (i.e., CheXpert Plus, IU X-ray, and MIMIC-CXR) demonstrate that our disease-aware framework achieves state-of-the-art performance in chest X-ray report generation, with significant improvements in clinical accuracy and linguistic quality.

翻译：基于胸部X光影像的放射学报告生成是人工智能领域的一项重要任务，具有显著减轻放射科医生工作负担和缩短患者等待时间的潜力。尽管近期取得了一定进展，现有方法通常在视觉表征中缺乏足够的疾病感知能力，且视觉与语言的对齐不足以满足医学影像分析的专业需求。因此，这些模型往往忽略胸部X光影像中的关键病理特征，难以生成临床准确的报告。为应对这些局限性，我们提出了一种新颖的疾病感知双阶段框架用于胸部X光报告生成。在第一阶段，我们的模型通过交叉注意力机制和多标签分类学习与特定病理类别对应的疾病感知语义标记（DASTs），同时借助对比学习实现视觉与语言表征的对齐。在第二阶段，我们引入疾病-视觉注意力融合（DVAF）模块，将疾病感知表征与视觉特征相结合，并设计双模态相似性检索（DMSR）机制，该机制融合视觉与疾病特异性相似度以检索相关范例，为报告生成过程提供上下文指导。在多个基准数据集（即CheXpert Plus、IU X-ray和MIMIC-CXR）上的大量实验表明，我们的疾病感知框架在胸部X光报告生成任务中达到了最先进的性能，在临床准确性和语言质量方面均有显著提升。