Automatic generation of radiology reports has the potential to alleviate radiologists' significant workload, yet current methods struggle to deliver clinically reliable conclusions. In particular, most prior approaches focus on producing fluent text without effectively ensuring the factual correctness of the reports and often rely on single-view images, limiting diagnostic comprehensiveness. We propose CLARIFID, a novel framework that directly optimizes diagnostic correctness by mirroring the two-step workflow of experts. Specifically, CLARIFID (1) learns the logical flow from Findings to Impression through section-aware pretraining, (2) is fine-tuned with Proximal Policy Optimization in which the CheXbert F1 score of the Impression section serves as the reward, (3) employs controlled decoding that completes "Findings" before synthesizing the "Impression", and (4) fuses multiple chest X-ray views via a vision-transformer-based multi-view encoder. During inference, we apply a next-token forcing strategy followed by report-level re-ranking, ensuring that the model first produces a comprehensive "Findings" section before synthesizing the "Impression" and thereby preserving coherent clinical reasoning. Experimental results on the MIMIC-CXR dataset demonstrate that our method achieves superior clinical efficacy and outperforms existing baselines on clinical efficacy scores.
翻译:自动生成放射学报告有潜力减轻放射科医生繁重的工作负担,但现有方法难以提供临床可靠的结论。特别是,大多数先前方法侧重于生成流畅的文本,未能有效确保报告的事实准确性,且通常依赖单视图图像,限制了诊断的全面性。我们提出CLARIFID,一种通过模拟专家两步工作流程直接优化诊断正确性的新框架。具体而言,CLARIFID(1)通过章节感知预训练学习从“发现”到“印象”的逻辑流程;(2)使用近端策略优化进行微调,其中印象部分的CheXbert F1分数作为奖励;(3)采用受控解码,在合成“印象”前先完成“发现”部分;(4)通过基于视觉Transformer的多视图编码器融合多视图胸部X光图像。在推理阶段,我们应用下一词元强制策略及报告级重排序,确保模型首先生成全面的“发现”部分,再合成“印象”,从而保持连贯的临床推理。在MIMIC-CXR数据集上的实验结果表明,我们的方法实现了卓越的临床效能,并在临床效能评分上优于现有基线。