We introduce BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model that supports text-based and image-based medical interactions. It enables multi-turn conversation in Arabic and English and supports diverse medical imaging modalities, including radiology, CT, and histology. To train BiMediX2, we curate BiMed-V, an extensive Arabic-English bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions. This dataset supports a range of medical Large Language Model (LLM) and Large Multimodal Model (LMM) tasks, including multi-turn medical conversations, report generation, and visual question answering (VQA). We also introduce BiMed-MBench, the first Arabic-English medical LMM evaluation benchmark, verified by medical experts. BiMediX2 demonstrates excellent performance across multiple medical LLM and LMM benchmarks, achieving state-of-the-art results compared to other open-sourced models. On BiMed-MBench, BiMediX2 outperforms existing methods by over 9% in English and more than 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by approximately 9% in UPHILL factual accuracy evaluations and excels in various medical VQA, report generation, and report summarization tasks. Our trained models, instruction set, and source code are available at https://github.com/mbzuai-oryx/BiMediX2
翻译:我们推出BiMediX2,一个双语(阿拉伯语-英语)生物医学专家大语言多模态模型,支持基于文本和图像的医学交互。该模型能够进行阿拉伯语和英语的多轮对话,并支持多样化的医学影像模态,包括放射学、CT和组织学。为训练BiMediX2,我们构建了BiMed-V,一个包含160万样本的广泛阿拉伯语-英语双语医疗数据集,涵盖多种医学交互类型。该数据集支持一系列医学大语言模型(LLM)和大语言多模态模型(LMM)任务,包括多轮医学对话、报告生成和视觉问答(VQA)。我们还推出了BiMed-MBench,首个经医学专家验证的阿拉伯语-英语医学LMM评估基准。BiMediX2在多个医学LLM和LMM基准测试中表现出色,相较于其他开源模型取得了最先进的成果。在BiMed-MBench上,BiMediX2在英语评估中超越现有方法超过9%,在阿拉伯语评估中超过20%。此外,在UPHILL事实准确性评估中,其表现优于GPT-4约9%,并在多种医学VQA、报告生成和报告摘要任务中表现卓越。我们训练的模型、指令集和源代码可在https://github.com/mbzuai-oryx/BiMediX2获取。