Patient management requires multitasking interaction with multimodal data. While today's AI, particularly large foundation models, promises unprecedented opportunities, progress remains relatively slow in developing medical multimodal multitask foundation models. There are two main challenges along this direction: the data challenge -- the high bar to curate medical multimodal multitask datasets including 3D medical tomographic images in alignment with other clinical datasets, and the model challenge -- the unavailability of a scalable and adaptable foundation model architecture to synergize multimodal datasets for diverse clinical tasks. Here we propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with an emphasis on lung cancer screening. To train our M3FM, we first curated a comprehensive multimodal multitask dataset consisting of 163,725 3D chest CT exams, 48 clinical data types, and 17 medical tasks on lung, heart, and other chest diseases. Then, we created and applied a multimodal question-answering framework as a unified training strategy to effectively integrate multimodal information and naturally perform multiple tasks with free-text prompting. Extensive experimental results demonstrate that M3FM consistently outperforms the previous state-of-the-art models. M3FM can identify informative multimodal data elements that are relevant to specific clinical tasks, being instrumental in building AI models and gaining insights into correlations among multimodal data and diseases. M3FM can be adapted to boost the performance of new tasks with a small out-of-distribution dataset. M3FM has enabled superior volumetric CT imaging performance for lung cancer screening, cardiac disease prediction, and other CT-related tasks. M3FM can be extended to incorporate more data types and improve other medical tasks, towards AI-empowered precise and efficient medicine.
翻译:暂无翻译