类似计算电子病历的异同电子病历代表 (Heterogeneous electronic medical record representation for similarity computing)

Due to the widespread use of tools and the development of text processing techniques, the size and range of clinical data are not limited to structured data. The rapid growth of recorded information has led to big data platforms in healthcare that could be used to improve patients' primary care and serve various secondary purposes. Patient similarity assessment is one of the secondary tasks in identifying patients who are similar to a given patient, and it helps derive insights from similar patients' records to provide better treatment. This type of assessment is based on calculating the distance between patients. Since representing and calculating the similarity of patients plays an essential role in many secondary uses of electronic records, this article examines a new data representation method for Electronic Medical Records (EMRs) while taking into account the information in clinical narratives for similarity computing. Some previous works are based on structured data types, while other works only use unstructured data. However, a comprehensive representation of the information contained in the EMR requires the effective aggregation of both structured and unstructured data. To address the limitations of previous methods, we propose a method that captures the co-occurrence of different medical events, including signs, symptoms, and diseases extracted via unstructured data and structured data. It integrates data as discriminative features to construct a temporal tree, considering the difference between events that have short-term and long-term impacts. Our results show that considering signs, symptoms, and diseases in every time interval leads to less MSE and more precision compared to baseline representations that do not consider this information or consider them separately from structured data.

翻译：由于广泛使用工具和发展文本处理技术,临床数据的规模和范围不仅限于结构化数据; 记录的信息的迅速增长导致保健领域有了大型数据平台,可用于改善病人的初级护理和各种辅助目的; 病人相似性评估是确定与特定病人相似的病人的次要任务之一,有助于从类似的病人记录中得出见解,以提供更好的治疗; 这种评估的基础是计算病人之间的距离; 由于代表并计算病人的相似性在许多电子记录第二用途中起着必不可少的作用,本篇文章审查了电子医疗记录(EMR)的一种新的数据表述方法,同时考虑了用于改进病人初级保健的临床描述中的信息,用于改进病人的初级护理和各种辅助目的; 病人相似性评估是确定与特定病人相似的病人的第二大任务之一,而有助于从类似的病人记录中获得深刻的见解; 然而,对类似病人记录中所含信息的全面表述需要有效地汇总结构化和无结构化的数据; 为解决以往方法的局限性,我们建议采用一种方法,从显示不同医疗事件的共同发生情况,包括迹象、症状、症状和疾病之间的结构化代表方法,从结构化数据到通过非结构化数据结构化数据,从结构化数据得出长期数据,从结构化数据,从结构化的特征到分析各种结果,从结构化的特征分析,从结构化数据到分析,从结构化的特征,从结构化的特征到分析,从结构化数据到分析,从结构化的形态和疾病,从结构化的特征,从结构化的特征,从结构化的特征到分析,从结构化的特征,从结构化的特征,从结构化的特征到分析各种,从结构化的特征到分析各种,从结构化的特征,从结构化的特征,从结构化的特征到分析,从结构化的特征,从结构化的特征,从结构间,从结构间的数据到分析,从结构性地的特征,从结构间的数据到分析,从结构间的数据到测出,从结构化的形态和疾病,考虑,从结构化的特征,从结构化的特征,从结构化的特征,从结构化的特征,从结构化的特征,从结构化的特征,从一种数据到分析,从结构化的特征,从一种数据到分析,从结构间间间间间间断的形态的形态和疾病,从结构间间间