使用纵向电子健康记录进行实时记事说明的半监督办法 (Semi-supervised Approach to Event Time Annotation Using Longitudinal Electronic Health Records)

Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-$n$ consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.

翻译：从保险索赔和电子健康记录(EHR)系统中得出的大型临床数据集是精确医学研究的宝贵来源。这些数据集可用于开发个人化预测风险或治疗反应的模型。但是,利用真实世界数据有效得出的预测模型面临实际和方法挑战。这些数据库没有现成的精确的临床结果信息,如癌症发病时间等,真正的临床事件时间通常无法根据简单的计费或程序代码摘录很好地加以估计。虽然人工记事时间是时间和资源禁忌的标志性数据。在本文中,我们建议采用两步半监督的多式自动自动时间注解(MATA)方法,利用多维纵向EHR遇到的记录。在第一阶段,我们采用功能性主要分析方法,根据未贴标签的病人的观察点过程来估计潜在的强度功能。在第二阶段,我们将一个受惩罚的相称性概率模型与事件结果相匹配,在标定的一步骤中,非参数是使用B-Spline $ 的混合时间说明(MATA) (MATA) 方法,在常规性、直径直径直径直径直径直径直径直径直的直径直径直径直径直的癌症的直径直径直的直压直径直径直系直压直压直径直系直系直系直系直系直系直系直系直根根根根根根根根根根根根根根根根根根根根根根根根根根根根根基数据,我们。我们。我们根根根根根根根根根根根根根根根根根根根根根根根基数据在直到直到直到直到直到直的根基根基根基根基根基根基根基根基根基根基根基根基根基根基根基根基根基根根根基根根根根根根根根根根根根根根根基根基根基根根根根根根基根基根基根基根基根基根基根基根基根基根基根基根基根根根根根基根根根根根根根根根基根基根基根基根基根基根基根基根基根基根根基根基根基根基根基根基根基根基根基