Prediction models provide risks of an adverse event occurring for an individual based on their characteristics. Some prediction models have been used to make treatment decisions, but this is not appropriate when the data on which the model was developed included a mix of individuals with some who did and others who did not initiate that treatment. By contrast, predictions under hypothetical interventions are estimates of what a person's risk of an outcome would be if they were to follow a particular treatment strategy, given their individual characteristics. Such predictions can give important input to medical decision making. However, evaluating predictive performance of interventional predictions is challenging. Standard ways of evaluating predictive performance do not apply, because prediction under interventions involves obtaining predictions of the outcome under conditions that differ from those that are observed for some patients in the validation data. This work describes methods for evaluating predictive performance of predictions under interventions using longitudinal observational data. We focus on time-to-event outcomes and predictions under treatment strategies that involve sustaining a particular treatment regime over time. We introduce a validation approach using artificial censoring and inverse probability weighting which involves creating a validation data set that mimics the particular treatment strategy under which predictions are made. We extend measures of calibration, discrimination and overall prediction error to the interventional prediction setting. The methods are evaluated using a simulation study and results show that our proposed approach and corresponding measures of predictive performance correctly capture the true predictive performance. The methods are applied to an example in the context of liver transplantation.
翻译:暂无翻译