Healthcare representation learning on the Electronic Health Record (EHR) is seen as crucial for predictive analytics in the medical field. Many natural language processing techniques, such as word2vec, RNN and self-attention, have been adapted for use in hierarchical and time stamped EHR data, but fail when they lack either general or task-specific data. Hence, some recent works train healthcare representations by incorporating medical ontology (a.k.a. knowledge graph), by self-supervised tasks like diagnosis prediction, but (1) the small-scale, monotonous ontology is insufficient for robust learning, and (2) critical contexts or dependencies underlying patient journeys are never exploited to enhance ontology learning. To address this, we propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics. Specifically, it consists of task-specific representation learning and graph-embedding modules to learn both patient journey and medical ontology interactively. Consequently, this creates a mutual integration to benefit both healthcare representation learning and medical ontology embedding. Moreover, such integration is achieved by a joint training of both task-specific predictive and ontology-based disease typing tasks based on fused embeddings of the two modules. Experiments conducted on two real-world diagnosis prediction datasets show that, our healthcare representation model MIMO not only achieves better predictive results than previous state-of-the-art approaches regardless of sufficient or insufficient training data, but also derives more interpretable embeddings of diagnoses.
翻译:在电子健康记录(EHR)上进行保健代表学习被认为对医学领域的预测分析至关重要,许多自然语言处理技术,如字2vec、RNN和自我注意,已经适应用于等级和时间印有EHR数据,但当它们缺乏一般或任务特定的数据时却失败。因此,最近的一些工作通过将医学代表学(a.k.a.知识图)纳入诊断预测等自我监督的任务来培训保健代表,但(1) 小规模、单调式肿瘤学不足以进行稳健学习,(2) 病人旅途背后的关键环境或依赖性从未被利用来加强本体学习。为了解决这个问题,我们建议了一种端对端的稳健变异体解决方案,病人旅程和医学本科(MIMO)的相互融合,通过分析预测性学习和预测性分析,具体任务学习和图表组合模块学习,但学习耐心旅行和医学本科互动。因此,这不会产生一种更好的相互融合,既有利于健康代表学方面的实际分析,也有利于医学诊断性分析模型化的双重分析。