Multi-modal fusion approaches aim to integrate information from different data sources. Unlike natural datasets, such as in audio-visual applications, where samples consist of "paired" modalities, data in healthcare is often collected asynchronously. Hence, requiring the presence of all modalities for a given sample is not realistic for clinical tasks and significantly limits the size of the dataset during training. In this paper, we propose MedFuse, a conceptually simple yet promising LSTM-based fusion module that can accommodate uni-modal as well as multi-modal input. We evaluate the fusion method and introduce new benchmark results for in-hospital mortality prediction and phenotype classification, using clinical time-series data in the MIMIC-IV dataset and corresponding chest X-ray images in MIMIC-CXR. Compared to more complex multi-modal fusion strategies, MedFuse provides a performance improvement by a large margin on the fully paired test set. It also remains robust across the partially paired test set containing samples with missing chest X-ray images. We release our code for reproducibility and to enable the evaluation of competing models in the future.
翻译:多式聚合法旨在整合不同数据来源的信息。与自然数据集不同,如视听应用中的样本由“paired”模式构成的天然数据集不同,保健数据往往不同步地收集。因此,要求特定样本存在所有模式对于临床任务是不现实的,而且大大限制了培训期间数据集的规模。在本文件中,我们提议MedFuse,这是一个概念简单但有希望的基于LSTM的聚合模块,可以容纳单式和多式输入。我们评估聚变法,并采用新的住院死亡率预测和苯型分类基准结果,使用MIMIMI-IV数据集中的临床时间序列数据和MIMIMI-C-XR中相应的胸部X射线图像。与更为复杂的多式混合战略相比,MedFuse在全式测试集中提供了很大的性能改进,在含有丢失的胸部X射线图像的半配对式测试组中,它还保持强健健健。我们发布了我们的代码,以便在未来对相竞模型进行评估。