Artificial intelligence (AI) systems hold great promise to improve healthcare over the next decades. Specifically, AI systems leveraging multiple data sources and input modalities are poised to become a viable method to deliver more accurate results and deployable pipelines across a wide range of applications. In this work, we propose and evaluate a unified Holistic AI in Medicine (HAIM) framework to facilitate the generation and testing of AI systems that leverage multimodal inputs. Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments. We evaluate our HAIM framework by training and characterizing 14,324 independent models based on HAIM-MIMIC-MM, a multimodal clinical database (N=34,537 samples) containing 7,279 unique hospitalizations and 6,485 patients, spanning all possible input combinations of 4 data modalities (i.e., tabular, time-series, text, and images), 11 unique data sources and 12 predictive tasks. We show that this framework can consistently and robustly produce models that outperform similar single-source approaches across various healthcare demonstrations (by 6-33%), including 10 distinct chest pathology diagnoses, along with length-of-stay and 48-hour mortality predictions. We also quantify the contribution of each modality and data source using Shapley values, which demonstrates the heterogeneity in data modality importance and the necessity of multimodal inputs across different healthcare-relevant tasks. The generalizable properties and flexibility of our Holistic AI in Medicine (HAIM) framework could offer a promising pathway for future multimodal predictive systems in clinical and operational healthcare settings.
翻译:具体地说,利用多种数据来源和投入模式的人工智能系统将成为提供更准确的结果和在各种应用中部署管道的可行方法。在这项工作中,我们提议并评价一个统一的医学整体AI(HAIIM)框架,以方便生成和测试利用多式联运投入的人工智能系统。我们的方法使用通用的数据预处理和机器学习模型阶段,这些阶段可以随时适应在医疗保健环境中的研究和部署。我们通过培训和描述14 324个独立模型来评估我们的HAIIM框架。基于HAIM-MIMIMIMIM-MMM(包含7 279个独特的住院病人和6 485个病人的多式联运临床数据库(N=34,537个样本),包括7 279个独特的住院病人和6 485个病人,涵盖4种数据模式(即表格、时间序列、文本和图像)、11个独特的数据源和12个预测任务。我们表明,这一框架可以持续和有力地生成各种医疗保健示范(6-33%的)的单一来源方法,包括10个不同程度的临床病理诊断,以及每48个不同程度的病理病理数据流数据,我们可以量化地对各种死亡率进行数据分析。