Sequential diagnosis prediction on the Electronic Health Record (EHR) has been proven crucial for predictive analytics in the medical domain. EHR data, sequential records of a patient's interactions with healthcare systems, has numerous inherent characteristics of temporality, irregularity and data insufficiency. Some recent works train healthcare predictive models by making use of sequential information in EHR data, but they are vulnerable to irregular, temporal EHR data with the states of admission/discharge from hospital, and insufficient data. To mitigate this, we propose an end-to-end robust transformer-based model called SETOR, which exploits neural ordinary differential equation to handle both irregular intervals between a patient's visits with admitted timestamps and length of stay in each visit, to alleviate the limitation of insufficient data by integrating medical ontology, and to capture the dependencies between the patient's visits by employing multi-layer transformer blocks. Experiments conducted on two real-world healthcare datasets show that, our sequential diagnoses prediction model SETOR not only achieves better predictive results than previous state-of-the-art approaches, irrespective of sufficient or insufficient training data, but also derives more interpretable embeddings of medical codes. The experimental codes are available at the GitHub repository (https://github.com/Xueping/SETOR).
翻译:电子健康记录(EHR)的序列诊断预测已被证明对医学领域的预测分析至关重要。EHR数据、病人与医疗系统互动的顺序记录、病人与医疗系统互动的连续记录、时间性、不规则性和数据不足等许多内在特征。最近的一些工作利用EHR数据中的顺序信息来培训保健预测模型,利用EHR数据中的顺序信息来培训保健预测模型,但是在医院的入院/出院状态和数据不足的情况下,这些模型很容易受到不规则的、时间性的、时间性的EHR数据的影响。为了减轻这一点,我们提议采用一个以端到端的强力变压器为基础的模型SETER,该模型利用神经性普通差异方程式处理病人探视与每次探视时间和时间长度之间的不定期间隔,通过整合医学学数据来减轻数据不足的限制,通过使用多层变压器区来捕捉病人访问之间的依赖性。对两种真实世界医疗数据集进行的实验表明,我们的连续诊断模型SETOR不仅比先前的状态/设计方法都取得更好的预测结果,而且可以更精确地进行实验性的研究代码。