We present brat (brain report alignment transformer), a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports. Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume. To address these challenges, we introduce a brain MRI dataset $10\times$ larger than existing ones, containing approximately 80,000 3D scans with corresponding radiology reports, and propose a multi-view pre-training approach inspired by advances in document retrieval. We develop an implicit query-feature matching mechanism and adopt concepts from quality-diversity to obtain multi-view embeddings of MRIs that are aligned with the clinical features given by report sentences. We evaluate our approach across multiple vision-language and vision tasks, demonstrating substantial performance improvements. The brat foundation models are publicly released.
翻译:我们提出了brat(脑部报告对齐Transformer),一种基于MRI与临床报告配对数据训练的多视角表示学习框架,用于脑部磁共振成像分析。脑部MRI因其存在大量高度多样且通常细微的异常特征而带来独特挑战,这些异常常局限于三维体积中的少数切片。为应对这些挑战,我们构建了规模达现有数据集$10$倍的脑部MRI数据集,包含约80,000个三维扫描及对应放射学报告,并借鉴文档检索领域进展提出多视角预训练方法。我们开发了隐式查询-特征匹配机制,采用质量-多样性优化概念,获得与报告语句所描述临床特征对齐的MRI多视角嵌入表示。通过在多个视觉-语言及纯视觉任务上的评估,我们的方法展现出显著的性能提升。brat基础模型已公开发布。