Prediction models provide risks of an adverse event occurring for an individual based on their characteristics. Some prediction models have been used to make treatment decisions, but this is not appropriate when the data on which the model was developed included a mix of individuals with some who did and others who did not initiate that treatment. By contrast, predictions under hypothetical interventions are estimates of what a person's risk of an outcome would be if they were to follow a particular treatment strategy, given their individual characteristics. Such predictions can give important input to medical decision making. However, evaluating predictive performance of interventional predictions is challenging. Standard ways of evaluating predictive performance do not apply, because prediction under interventions involves obtaining predictions of the outcome under conditions that differ from those that are observed for some patients in the validation data. This work describes methods for evaluating predictive performance of predictions under interventions using longitudinal observational data. We focus on time-to-event outcomes and predictions under treatment strategies that involve sustaining a particular treatment regime over time. We introduce a validation approach using artificial censoring and inverse probability weighting which involves creating a validation data set that mimics the particular treatment strategy under which predictions are made. We extend measures of calibration, discrimination and overall prediction error to the interventional prediction setting. The methods are evaluated using a simulation study and results show that our proposed approach and corresponding measures of predictive performance correctly capture the true predictive performance. The methods are applied to an example in the context of liver transplantation.
翻译:预测模型根据个体特征提供不良事件发生的风险。一些预测模型已被用于做出治疗决策,但在开发模型的数据中包含了一些有些接受了治疗,有些没有接受治疗的个体时,这是不合适的。相比之下,假设干预下的预测是根据个体特征估计某个人如果按照特定治疗策略进行治疗,出现某种结果的风险。这种预测可以为医疗决策提供重要的参考。然而,评估干预预测的预测性能具有挑战性。标准的评估预测性能的方法不适用,因为干预预测涉及到在某些验证数据的观察条件不同于实际观察到的情况下获得结果的预测。本文描述了利用纵向观察数据评估预测干预下预测性能的方法。我们重点关注时间到事件结果和在一段时间内维持特定治疗方案下的治疗策略干预预测。我们引入了一种使用人工截尾和倒数权重的验证方法,该方法涉及创建模拟特定治疗策略的验证数据集。我们扩展了校准、辨别、整体预测差异的衡量值以适应干预预测的情况。通过模拟研究对这些方法进行了评估,结果显示我们提出的方法和相应的预测性能衡量值正确地捕捉到了真正的预测性能。我们将这些方法应用于肝移植领域的一个案例中。