电子健康记录中时间数据代表性的深入学习:系统审查挑战和方法 (Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies)

Objective: Temporal electronic health records (EHRs) can be a wealth of information for secondary uses, such as clinical events prediction or chronic disease management. However, challenges exist for temporal data representation. We therefore sought to identify these challenges and evaluate novel methodologies for addressing them through a systematic examination of deep learning solutions. Methods: We searched five databases (PubMed, EMBASE, the Institute of Electrical and Electronics Engineers [IEEE] Xplore Digital Library, the Association for Computing Machinery [ACM] digital library, and Web of Science) complemented with hand-searching in several prestigious computer science conference proceedings. We sought articles that reported deep learning methodologies on temporal data representation in structured EHR data from January 1, 2010, to August 30, 2020. We summarized and analyzed the selected articles from three perspectives: nature of time series, methodology, and model implementation. Results: We included 98 articles related to temporal data representation using deep learning. Four major challenges were identified, including data irregularity, data heterogeneity, data sparsity, and model opacity. We then studied how deep learning techniques were applied to address these challenges. Finally, we discuss some open challenges arising from deep learning. Conclusion: Temporal EHR data present several major challenges for clinical prediction modeling and data utilization. To some extent, current deep learning solutions can address these challenges. Future studies can consider designing comprehensive and integrated solutions. Moreover, researchers should incorporate additional clinical domain knowledge into study designs and enhance the interpretability of the model to facilitate its implementation in clinical practice.

翻译：目标:时间电子健康记录(EHRs)可以是用于二次用途的丰富信息,如临床事件预测或慢性疾病管理等。然而,在时间数据代表性方面存在挑战。因此,我们力求通过系统研究深层学习解决方案,查明这些挑战并评价应对这些挑战的新方法。方法:我们搜索了五个数据库(PubMed、EMBASE、电气和电子工程师研究所Xplore数字图书馆、计算机机械数字图书馆和科学网),并在几个著名的计算机科学会议程序中进行亲身研究。我们寻求文章,报告2010年1月1日至2020年8月30日期间在结构化电子人力资源数据中采用时间数据代表性的深层学习方法。我们从三个角度总结并分析了选定的文章:时间序列的性质、方法和模型执行模式。结果:我们列入了98篇关于利用深层学习的时间数据代表性的文章。确定了四项主要挑战,包括数据异常性、数据模型性、数据紧张性、数据紧张性以及模型不透明性。我们随后研究了如何应用深层次学习技术来应对这些挑战。我们从2010年1月1日至2020年8月30日期间,从结构中总结了当前在深度数据利用过程中的一些公开数据影响。