Substantial increase in the use of Electronic Health Records (EHRs) has opened new frontiers for predictive healthcare. However, while EHR systems are nearly ubiquitous, they lack a unified code system for representing medical concepts. Heterogeneous formats of EHR present a substantial barrier for the training and deployment of state-of-the-art deep learning models at scale. To overcome this problem, we introduce Description-based Embedding, DescEmb, a code-agnostic description-based representation learning framework for predictive modeling on EHR. DescEmb takes advantage of the flexibility of neural language understanding models while maintaining a neutral approach that can be combined with prior frameworks for task-specific representation learning or predictive modeling. We tested our model's capacity on various experiments including prediction tasks, transfer learning and pooled learning. DescEmb shows higher performance in overall experiments compared to code-based approach, opening the door to a text-based approach in predictive healthcare research that is not constrained by EHR structure nor special domain knowledge.
翻译:电子健康记录(EHRs)的使用大幅增加,为预测性保健开辟了新的前沿;然而,虽然EHR系统几乎无处不在,但它们缺乏代表医疗概念的统一代码系统; 电子健康记录(EHRs)的多样化格式对培训和部署规模最先进的深层次学习模式构成巨大障碍; 为解决这一问题,我们引入了基于描述的嵌入、descEmb(基于代码的、不可知的描述描述的代议制学习框架),用于预测性保健模型。 DescEmb(Descemb)利用了神经语言理解模型的灵活性,同时保持了中性方法,该方法可以与先前的任务特定代表性学习或预测性模型框架相结合。我们测试了我们模型在各种实验方面的能力,包括预测性任务、转移学习和集合学习。Decemb(Descemb)显示,总体实验的绩效高于基于代码的方法,为基于文本的保健研究打开了大门,不受EHR结构和特殊领域知识制约。