乳腺癌视觉语言模型：临床实用的视觉语言训练-推理模型 (Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models)

Breast cancer remains the most commonly diagnosed malignancy among women in the developed world. Early detection through mammography screening plays a pivotal role in reducing mortality rates. While computer-aided diagnosis (CAD) systems have shown promise in assisting radiologists, existing approaches face critical limitations in clinical deployment - particularly in handling the nuanced interpretation of multi-modal data and feasibility due to the requirement of prior clinical history. This study introduces a novel framework that synergistically combines visual features from 2D mammograms with structured textual descriptors derived from easily accessible clinical metadata and synthesized radiological reports through innovative tokenization modules. Our proposed methods in this study demonstrate that strategic integration of convolutional neural networks (ConvNets) with language representations achieves superior performance to vision transformer-based models while handling high-resolution images and enabling practical deployment across diverse populations. By evaluating it on multi-national cohort screening mammograms, our multi-modal approach achieves superior performance in cancer detection and calcification identification compared to unimodal baselines, with particular improvements. The proposed method establishes a new paradigm for developing clinically viable VLM-based CAD systems that effectively leverage imaging data and contextual patient information through effective fusion mechanisms.

翻译：乳腺癌仍是发达国家女性中最常被诊断出的恶性肿瘤。通过乳腺X光筛查进行早期检测在降低死亡率方面发挥着关键作用。尽管计算机辅助诊断系统在协助放射科医师方面显示出潜力，但现有方法在临床部署中面临关键限制——特别是在处理多模态数据的细微解读方面，以及由于需要先前的临床病史而导致的可行性问题。本研究引入了一种新颖框架，通过创新的分词模块，将二维乳腺X光片的视觉特征与来自易于获取的临床元数据和合成放射学报告的结构化文本描述符协同结合。我们提出的方法表明，卷积神经网络与语言表示的战略整合在处理高分辨率图像并在不同人群中实现实际部署时，性能优于基于视觉Transformer的模型。通过在跨国队列筛查乳腺X光片上进行评估，我们的多模态方法在癌症检测和钙化识别方面相比单模态基线实现了更优的性能，并具有特定的改进。所提出的方法为开发临床可行的基于视觉语言模型的计算机辅助诊断系统建立了新范式，该系统通过有效的融合机制充分利用影像数据和上下文患者信息。