We address the problem of predicting when a disease will develop, i.e., medical event time (MET), from a patient's electronic health record (EHR). The MET of non-communicable diseases like diabetes is highly correlated to cumulative health conditions, more specifically, how much time the patient spent with specific health conditions in the past. The common time-series representation is indirect in extracting such information from EHR because it focuses on detailed dependencies between values in successive observations, not cumulative information. We propose a novel data representation for EHR called cumulative stay-time representation (CTR), which directly models such cumulative health conditions. We derive a trainable construction of CTR based on neural networks that has the flexibility to fit the target data and scalability to handle high-dimensional EHR. Numerical experiments using synthetic and real-world datasets demonstrate that CTR alone achieves a high prediction performance, and it enhances the performance of existing models when combined with them.
翻译:我们从病人的电子健康记录(EHR)中预测疾病何时会发展,即医疗事件时间(MET)的问题。糖尿病等非传染病的MET与累积的健康状况密切相关,更具体地说,病人过去在特定健康状况中花费了多少时间。在从EHR提取此类信息时,共同的时间序列代表性是间接的,因为它侧重于连续观察中数值之间的详细依赖性,而不是累积信息。我们提议为EHR提供新的数据代表,称为累积的停留时间代表(CTR),直接模拟这种累积的健康状况。我们根据神经网络进行一种可训练的CTR结构,这种网络具有适应目标数据的灵活性和适应高维度 EHR的可扩展性。使用合成和现实世界数据集的数值实验表明,CTR单独实现高预测性,在与这些模型相结合时会提高现有模型的性能。