With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to conventional landmark prediction with fixed landmark times, our methods allow the landmark times to be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue of model incompatibility at different landmark times. In our framework, both the longitudinal predictors and the event time outcome are subject to right censoring, and thus existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we propose a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.
翻译:由于从电子健康记录和登记数据库获得大量数据,纳入了时间变化的病人信息,以改进风险预测,因此引起极大关注。为了利用不断增多的预测信息,我们制定了使用生存树群进行里程碑式预测的统一框架,在获得新信息时可以进行更新预测。与具有固定里程碑时代的传统里程碑式预测相比,我们的方法允许根据主题和中期临床事件触发的里程碑式时间。此外,非参数性方法绕过不同里程碑时代模型不兼容的棘手问题。在我们的框架内,纵向预测器和事件时间结果都受到正确的审查,因此现有基于树木的方法无法直接应用。为了应对分析挑战,我们提出了一个基于风险的共性程序,通过平均估算单个树木的马丁格尔方程。我们进行了广泛的模拟研究,以评价我们方法的性能。这些方法被应用于Cystic Fibrois病人登记册(CFFPR)数据,以便对细胞性纤维化病人的肺病进行动态预测,并查明重要的预测因素。