The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the complete patient history includes much more repeated markers possibly. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of markers. We combined a landmark approach extended to endogenous markers history with machine learning methods adapted to survival data. Each marker trajectory is modeled using the information collected up to landmark time, and summary variables that best capture the individual trajectories are derived. These summaries and additional covariates are then included in different prediction methods. To handle a possibly large dimensional history, we rely on machine learning methods adapted to survival data, namely regularized regressions and random survival forests, to predict the event from the landmark time, and we show how they can be combined into a superlearner. Then, the performances are evaluated by cross-validation using estimators of Brier Score and the area under the Receiver Operating Characteristic curve adapted to censored data. We demonstrate in a simulation study the benefits of machine learning survival methods over standard survival models, especially in the case of numerous and/or nonlinear relationships between the predictors and the event. We then applied the methodology in two prediction contexts: a clinical context with the prediction of death for patients with primary biliary cholangitis, and a public health context with the prediction of death in the general elderly population at different ages. Our methodology, implemented in R, enables the prediction of an event using the entire longitudinal patient history, even when the number of repeated markers is large. Although introduced with mixed models for the repeated markers and methods for a single right censored time-to-event, our method can be used with any other appropriate modeling technique for the markers and can be easily extended to competing risks setting.
翻译:在患者跟踪过程中收集的个人数据构成了评估临床事件风险和最终适应治疗战略的关键信息。 已经提出了联合模型和里程碑模型, 以将个人动态预测从重复的措施计算成一个或两个标记。 然而, 它们几乎没有延伸到病人完整历史包含可能更多重复标记的情况。 因此, 我们的目标是为动态预测健康事件提出解决办法, 可能利用大量标记的反复测量。 我们将一个里程碑式的方法推广到内生标记历史, 并采用适应生存数据的机器学习方法。 每个标志性轨迹都是通过收集到里程碑式时间的信息来建模的, 并用汇总变量变量变量来建模, 来计算个人生存轨迹。 这些摘要和额外的共变数将被纳入不同的预测方法。 为了处理一个可能大型的病人历史, 我们依靠机器学习方法来适应生存数据, 即正常回归和随机生存森林, 从时间段开始预测事件, 并且我们展示如何将它们结合到一个超级病人的学习方法。 然后, 通过交叉对比背景来评估业绩, 并且通过两个对内部的预测性内部数据进行测试, 系统, 并且用一个测试模型式的模型学方法, 使用一个不同的系统, 在测试中, 数据序列中, 测试中, 运行中, 运行中, 运行中, 使用一个普通的 数据序列中, 数据序列中, 使用一个普通的计算, 数据序列中的数据 数据序列中, 使用一个测试中, 使用一个系统, 的模型和序列中, 使用一个不同的计算方法, 。