Electronic Medical Records (EHR) are extremely sparse. Only a small proportion of events (symptoms, diagnoses, and treatments) are observed in the lifetime of an individual. The high degree of missingness of EHR can be attributed to a large number of factors, including device failure, privacy concerns, or other unexpected reasons. Unfortunately, many traditional imputation methods are not well suited for highly sparse data and scale poorly to high dimensional datasets. In this paper, we propose a graph-based imputation method that is both robust to sparsity and to unreliable unmeasured events. Our approach compares favourably to several standard and state-of-the-art imputation methods in terms of performance and runtime. Moreover, results indicate that the model learns to embed different event types in a clinically meaningful way. Our work can facilitate the diagnosis of novel diseases based on the clinical history of past events, with the potential to increase our understanding of the landscape of comorbidities.
翻译:电子医疗记录(EHR)极为稀少,只有一小部分事件(症状、诊断和治疗)是在个人一生中观察到的。EHR的缺失程度可归因于大量因素,包括装置故障、隐私问题或其他意外原因。不幸的是,许多传统估算方法并不适合于高度稀少的数据,其规模不及高维数据集。我们在本文件中提出了一种基于图表的估算方法,既能耐急又能耐不住不可靠的非计量事件。我们的方法优于几种标准的、最先进的估算方法,在性能和运行时间方面。此外,结果显示模型学会以具有临床意义的方式嵌入不同事件类型。我们的工作可以促进根据过去事件的临床史诊断新疾病,并有可能增进我们对共性景观的了解。