Electronic Health Records (EHR) are generated from clinical routine care recording valuable information of broad patient populations, which provide plentiful opportunities for improving patient management and intervention strategies in clinical practice. To exploit the enormous potential of EHR data, a popular EHR data analysis paradigm in machine learning is EHR representation learning, which first leverages the individual patient's EHR data to learn informative representations by a backbone, and supports diverse health-care downstream tasks grounded on the representations. Unfortunately, such a paradigm fails to access the in-depth analysis of patients' relevance, which is generally known as cohort studies in clinical practice. Specifically, patients in the same cohort tend to share similar characteristics, implying their resemblance in medical conditions such as symptoms or diseases. In this paper, we propose a universal COhort Representation lEarning (CORE) framework to augment EHR utilization by leveraging the fine-grained cohort information among patients. In particular, CORE first develops an explicit patient modeling task based on the prior knowledge of patients' diagnosis codes, which measures the latent relevance among patients to adaptively divide the cohorts for each patient. Based on the constructed cohorts, CORE recodes the pre-extracted EHR data representation from intra- and inter-cohort perspectives, yielding augmented EHR data representation learning. CORE is readily applicable to diverse backbone models, serving as a universal plug-in framework to infuse cohort information into healthcare methods for boosted performance. We conduct an extensive experimental evaluation on two real-world datasets, and the experimental results demonstrate the effectiveness and generalizability of CORE.
翻译:电子健康记录(EHR)是在临床例行护理中产生的记录广泛患者人群宝贵信息的数据,为改善临床实践中的患者管理和干预策略提供了丰富的机会。为了挖掘EHR数据的巨大潜力,机器学习中一个流行的EHR数据分析范例是EHR表示学习,它首先利用单个患者的EHR数据通过骨干学习有用的表示,并支持基于表示的多样化的医疗保健下游任务。不幸的是,这样的范例无法访问患者相关性的深入分析,这在临床实践中通常称为队列研究。具体来说,在同一队列中的患者往往具有类似的特征,表明他们在症状或疾病等医疗状况方面具有相似之处。在本文中,我们提出了一种通用的队列表示学习(CORE)框架,通过利用患者之间的细粒度队列信息,增加EHR利用率。特别地,CORE首先基于患者的诊断代码的先验知识开发一个显式的患者建模任务,衡量患者之间的潜在相关性以自适应地将队列分开。基于构建的队列,CORE从队列内和队列间的视角重新编码预先提取的EHR数据表示,产生增强的EHR数据表示学习。CORE可以轻松适用于不同的后端模型,作为通用的插件框架,将队列信息注入医疗方法以提高性能。我们在两个真实数据集上进行了广泛的实验评估,实验结果证明了CORE的有效性和普适性。