BiMediX2：面向多样化医学模态的生物医学专家大语言多模态模型 (BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities)

Sahal Shaji Mullappilly,Mohammed Irfan Kurpath,Sara Pieri,Saeed Yahya Alseiari,Shanavas Cholakkal,Khaled Aldahmani,Fahad Khan,Rao Anwer,Salman Khan,Timothy Baldwin,Hisham Cholakkal

from arxiv, Accepted to EMNLP 2025 (Findings)

We introduce BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model that supports text-based and image-based medical interactions. It enables multi-turn conversation in Arabic and English and supports diverse medical imaging modalities, including radiology, CT, and histology. To train BiMediX2, we curate BiMed-V, an extensive Arabic-English bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions. This dataset supports a range of medical Large Language Model (LLM) and Large Multimodal Model (LMM) tasks, including multi-turn medical conversations, report generation, and visual question answering (VQA). We also introduce BiMed-MBench, the first Arabic-English medical LMM evaluation benchmark, verified by medical experts. BiMediX2 demonstrates excellent performance across multiple medical LLM and LMM benchmarks, achieving state-of-the-art results compared to other open-sourced models. On BiMed-MBench, BiMediX2 outperforms existing methods by over 9% in English and more than 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by approximately 9% in UPHILL factual accuracy evaluations and excels in various medical VQA, report generation, and report summarization tasks. Our trained models, instruction set, and source code are available at https://github.com/mbzuai-oryx/BiMediX2

翻译：我们推出BiMediX2，一个双语（阿拉伯语-英语）生物医学专家大语言多模态模型，支持基于文本和图像的医学交互。该模型能够进行阿拉伯语和英语的多轮对话，并支持多样化的医学影像模态，包括放射学、CT和组织学。为训练BiMediX2，我们构建了BiMed-V，一个包含160万样本的广泛阿拉伯语-英语双语医疗数据集，涵盖多种医学交互类型。该数据集支持一系列医学大语言模型（LLM）和大语言多模态模型（LMM）任务，包括多轮医学对话、报告生成和视觉问答（VQA）。我们还推出了BiMed-MBench，首个经医学专家验证的阿拉伯语-英语医学LMM评估基准。BiMediX2在多个医学LLM和LMM基准测试中表现出色，相较于其他开源模型取得了最先进的成果。在BiMed-MBench上，BiMediX2在英语评估中超越现有方法超过9%，在阿拉伯语评估中超过20%。此外，在UPHILL事实准确性评估中，其表现优于GPT-4约9%，并在多种医学VQA、报告生成和报告摘要任务中表现卓越。我们训练的模型、指令集和源代码可在https://github.com/mbzuai-oryx/BiMediX2获取。