Multiple Sclerosis (MS) is a chronic disease developed in human brain and spinal cord, which can cause permanent damage or deterioration of the nerves. The severity of MS disease is monitored by the Expanded Disability Status Scale (EDSS), composed of several functional sub-scores. Early and accurate classification of MS disease severity is critical for slowing down or preventing disease progression via applying early therapeutic intervention strategies. Recent advances in deep learning and the wide use of Electronic Health Records (EHR) creates opportunities to apply data-driven and predictive modeling tools for this goal. Previous studies focusing on using single-modal machine learning and deep learning algorithms were limited in terms of prediction accuracy due to the data insufficiency or model simplicity. In this paper, we proposed an idea of using patients' multimodal longitudinal and longitudinal EHR data to predict multiple sclerosis disease severity at the hospital visit. This work has two important contributions. First, we describe a pilot effort to leverage structured EHR data, neuroimaging data and clinical notes to build a multi-modal deep learning framework to predict patient's MS disease severity. The proposed pipeline demonstrates up to 25% increase in terms of the area under the Area Under the Receiver Operating Characteristic curve (AUROC) compared to models using single-modal data. Second, the study also provides insights regarding the amount useful signal embedded in each data modality with respect to MS disease prediction, which may improve data collection processes.
翻译:多发性硬化症(Multiple Sclerosis, MS)是一种在人脑和脊髓中发展的慢性疾病,可以导致神经永久性损伤或退化。MS疾病的严重程度由扩展残疾状态量表(Expanded Disability Status Scale, EDSS)监测,包括几个功能子分数。准确早期地分类MS疾病的严重程度对通过应用早期治疗干预策略来减缓或预防疾病进展至关重要。深度学习的最新进展和电子病历(Electronic Health Records, EHR)的广泛使用为此目标提供了数据驱动和预测建模工具的机会。以往使用单模式机器学习和深度学习算法的研究在预测准确性方面受限于数据不足或模型简单性。在本文中,我们提出了利用患者多模态纵向和长纵向的EHR数据来预测医院就诊时的多发性硬化症疾病严重程度的想法。此工作有两个重要贡献。首先,我们描述了一项利用结构化EHR数据、神经影像数据和临床笔记构建多模态深度学习框架预测患者MS疾病严重程度的试点工作。所提出的流程比使用单一模态数据的模型在ROC曲线下面积(Area Under the Receiver Operating Characteristic curve, AUROC)方面提高了多达25%。其次,该研究还提供了关于每个数据模态嵌入在MS疾病预测中的有效信号量的见解,这可以改进数据收集过程。