One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure. EHR usually contains a mixture of structured (codes) and unstructured (free-text) data with sparse and irregular longitudinal features -- all of which doctors utilize when making decisions. In the deep learning regime, determining how different modality representations should be fused together is a difficult problem, which is often addressed by handcrafted modeling and intuition. In this work, we extend state-of-the-art neural architecture search (NAS) methods and propose MUltimodal Fusion Architecture SeArch (MUFASA) to simultaneously search across multimodal fusion strategies and modality-specific architectures for the first time. We demonstrate empirically that our MUFASA method outperforms established unimodal NAS on public EHR data with comparable computation costs. In addition, MUFASA produces architectures that outperform Transformer and Evolved Transformer. Compared with these baselines on CCS diagnosis code prediction, our discovered models improve top-5 recall from 0.88 to 0.91 and demonstrate the ability to generalize to other EHR tasks. Studying our top architecture in depth, we provide empirical evidence that MUFASA's improvements are derived from its ability to both customize modeling for each data modality and find effective fusion strategies.
翻译:对电子健康记录应用深层次学习(EHR)是一项重要挑战,即其多式联运结构的复杂性。EHR通常包含结构化(代码)和无结构化(自由文本)数据,其结构化(代码)和无结构化(自由文本)数据混杂在一起,在决策过程中医生都使用这些数据。在深层次的学习制度中,确定不同模式的表述应如何结合是一个难题,通常通过手工制作的模型和直觉加以解决。在这项工作中,我们扩展了最先进的神经结构搜索(NAS)方法,并提出了MUltimodal 复合结构Search(MUFASA),以便首次同时搜索多模式融合战略和特定模式结构。我们从经验上表明,我们的MUFAS方法在公共数据中超越了单一模式的NAS, 并具有可比的计算成本。此外,MUFSA还制作了超越变压器和变压器的结构。与CIS诊断代码预测的这些基线相比,我们发现的模型从0.88到0.91到0.91,并展示了我们从最高模型中获取到其他标准化能力的能力。