项目名称: 面向跨领域异构数据的患者相似性学习方法及应用
项目编号: No.81671786
项目类型: 面上项目
立项/批准年度: 2017
项目学科: 医药、卫生
项目作者: 陈卉
作者单位: 首都医科大学
项目金额: 25万元
中文摘要: 随着电子病历系统在医疗机构的迅速普及,大量医疗相关的重要信息以电子形式存储于医疗信息系统中。经过不断积累,各种形式的医疗信息系统产生了体量庞大的医疗大数据,成为生成医学证据的巨大来源,而寻找准确的“与我相似的”患者将极大地促进有效证据的生成及其广泛应用。由于医疗大数据具有异构、稀疏、有噪声的特点,目前患者相似性研究多针对来源和类型单一的完备数据,且依赖于具体应用。本研究旨在系统地研究如何从各种异构患者数据中学习有效、准确、可靠、能适应不同应用的患者相似性。我们提出一种渐进的患者相似性学习框架:为每类患者信息构建一个相似性矩阵并通过矩阵补全消除数据噪声、完备数据;获得专家反馈信息(患者标签、成对约束和相对比较),通过监督学习提高患者相似性的准确性和可靠性。最后在患者个体层次和群体层次上分别应用患者相似性度量,探索大数据背景下电子病历数据的临床二次应用。
中文关键词: 患者相似性;异构数据;主动学习;半监督学习;电子病历
英文摘要: With the rapid spread of electronic medical records system in medical institutions, a large amount of medical-related important information has been stored electronically in the medical information systems. After continuous accumulation, a large volume of medical data provided by various medical information systems has become a great source of medical evidence. Finding “patient like me”, i.e. efficiently creating effective patient similarity measures, will facilitate the generation of effective evidence and their widely uses. However, since medical records are usually heterogeneous, sparse and noisy, most existing work on patient similarity is for complete data from single homogenous source and type, and varies depending on the application. The goal of this research project is to systematically investigate how we can learn effective, accurate and robust patient similarity measures from various heterogeneous information sources, as well as adapt the learned similarity measure across different applications. We will propose an incremental learning framework for patient similarity learning. We will first construct a similarity measure for each type of information (source/representation) and complete the similarity matrix. Experts' feedback on the patients in terms of patient labels, pairwise constraints or relative comparisons will be obtained to further improve the accuracy and reliability of the patient similarities through supervised learning. We will leverage the patient similarity measures across heterogeneous data sources in two types of application scenarios: individual analysis and population analysis, and will explore the reuse of electronic health data in clinical medicine in the era of big data.
英文关键词: Patient Similarity;Heterogeneous Data;Active Learning;Semi-supervised Learning;Electronic Medical Records