Language modeling have shown impressive progress in generating compelling text with good accuracy and high semantic coherence. An interesting research direction is to augment these powerful models for specific applications using contextual information. In this work, we explore multi-modal language modeling for healthcare applications. We are interested in outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at triage. We adapt Perceiver - a modality-agnostic transformer-based model that has shown promising results in several applications. Since vital-sign modality is represented in tabular format, we modified Perceiver position encoding to ensure permutation invariance. We evaluated the multi-modal language model for the task of diagnosis code prediction using MIMIC-IV ED dataset on 120K visits. In the experimental analysis, we show that mutli-modality improves the prediction performance compared with models trained solely on text or vital signs. We identified disease categories for which multi-modality leads to performance improvement and show that for these categories, vital signs have added predictive power. By analyzing the cross-attention layer, we show how multi-modality contributes to model predictions. This work gives interesting insights on the development of multi-modal language models for healthcare applications.
翻译:语言建模已经展示出在生成准确性高、语义相关性强的文本方面取得了惊人的进展。一种有趣的研究方向是利用上下文信息来增强这些强大的模型,以适用于特定的应用领域。在这项工作中,我们探讨了用于医疗保健应用的多模态语言建模。我们感兴趣的是基于急诊室主诉和三级保健记录的文本信息,对患者的结果预测和分级。我们改编了Perceiver——一种在多项应用中表现出良好效果的模态不可知变形器模型。由于生命体征模态以表格格式表示,因此我们修改了Perceiver的位置编码以确保置换不变性。我们在MIMIC-IV ED数据集的120K次访问中,评估了多模态语言模型在诊断代码预测任务上的效果。在实验分析中,我们显示多模态性提高了预测性能,与仅针对文本或生命体征训练的模型相比。我们确定了某些疾病类别,其中多模态性提高了性能,并且显示出对于这些类别,生命体征具有额外的预测能力。通过分析交叉注意层,我们展示了多模态性如何有助于模型预测。这项工作提供了开发用于医疗保健应用的多模态语言模型的有趣见解。