孙医生：面向生物医学人工智能的双语多模态大语言模型 (Doctor Sun: A Bilingual Multimodal Large Language Model for Biomedical AI)

Large multimodal models (LMMs) have demonstrated significant potential in providing innovative solutions for various biomedical tasks, including pathology analysis, radiology report generation, and biomedical assistance. However, the existing multimodal biomedical AI is typically based on foundation LLMs, thus hindering the understanding of intricate medical concepts with limited medical training data. Moreover, recent LLaVA-induced medical LMMs struggle to effectively capture the intricate relationship between the texts and the images. Therefore, we introduce Doctor Sun, a large multimodal generative model specialized in medicine, developed to encode, integrate, and interpret diverse biomedical data modalities such as text and images. In particular, Doctor Sun integrates a pre-trained vision encoder with a medical LLM and conducts two-stage training on various medical datasets, focusing on feature alignment and instruction tuning. Moreover, we release SunMed-VL, a wide-range bilingual medical multimodal dataset, along with all associated models, code, and resources, to freely support the advancement of biomedical multimodal research.

翻译：大型多模态模型（LMMs）在提供创新解决方案以应对多种生物医学任务方面展现出巨大潜力，这些任务包括病理分析、放射学报告生成以及生物医学辅助。然而，现有的多模态生物医学人工智能通常基于基础大语言模型（LLMs）构建，这限制了其在有限医学训练数据下对复杂医学概念的理解。此外，近期基于LLaVA架构的医学LMMs难以有效捕捉文本与图像之间错综复杂的关系。为此，我们推出了孙医生（Doctor Sun），一个专精于医学领域的大型多模态生成模型，旨在编码、整合并解读文本与图像等多样化的生物医学数据模态。具体而言，孙医生整合了一个预训练的视觉编码器与一个医学大语言模型，并在多种医学数据集上进行了两阶段训练，重点聚焦于特征对齐与指令微调。此外，我们发布了SunMed-VL，一个广泛覆盖的双语医学多模态数据集，连同所有相关模型、代码及资源，以免费支持生物医学多模态研究的发展。