A meaningful understanding of clinical protocols and patient pathways helps improve healthcare outcomes. Electronic health records (EHR) reflect real-world treatment behaviours that are used to enhance healthcare management but present challenges; protocols and pathways are often loosely defined and with elements frequently not recorded in EHRs, complicating the enhancement. To solve this challenge, healthcare objectives associated with healthcare management activities can be indirectly observed in EHRs as latent topics. Topic models, such as Latent Dirichlet Allocation (LDA), are used to identify latent patterns in EHR data. However, they do not examine the ordered nature of EHR sequences, nor do they appraise individual events in isolation. Our novel approach, the Categorical Sequence Encoder (CaSE) addresses these shortcomings. The sequential nature of EHRs is captured by CaSE's event-level representations, revealing latent healthcare objectives. In synthetic EHR sequences, CaSE outperforms LDA by up to 37% at identifying healthcare objectives. In the real-world MIMIC-III dataset, CaSE identifies meaningful representations that could critically enhance protocol and pathway development.
翻译:电子健康记录(EHR)反映了现实世界治疗行为,用于加强保健管理,但目前存在挑战;协议和途径往往定义松散,其要素往往没有在EHR中记录,使增强工作复杂化。为了解决这一挑战,可以在EHR中间接观察到与保健管理活动有关的保健目标,将其作为潜在主题。诸如Lient Dirichlet分配(LDA)等专题模型被用于确定EHR数据中的潜伏模式。然而,电子健康记录没有审查EHR序列的定序性质,也没有孤立地评估个别事件。我们的新颖的方法,即Calegorical序列(Case)处理这些缺陷。EHR的相继性质通过CASE的事件层次描述,揭示潜在的保健目标。在合成EHR序列中,CASE超越LDA,在确定保健目标时达到37%。在现实世界的MIMI-III数据集中,CESE确定了能够极大地加强协议和路径发展的有意义的表述。