A treatment benefit predictor (TBP) maps patient characteristics into an estimate of the treatment benefit for that patient, which can support optimizing treatment decisions. However, evaluating the predictive performance of a TBP is challenging, as it often must be conducted in a sample where treatment assignment is not random. We show conceptually how to approach validating a pre-specified TBP using observational data from the target population, in the context of a binary treatment decision at a single time point. We exemplify with a particular measure of discrimination (the concentration of benefit index) and a particular measure of calibration (the moderate calibration curve). The population-level definitions of these metrics involve the latent (counterfactual) treatment benefit variable, but we show identification by re-expressing the respective estimands in terms of the distribution of observable data only. We also show that in the absence of full confounding control, bias propagates in a more complex manner than when targeting more commonly encountered estimands (such as the average treatment effect, or the average treatment effect amongst the treated). Our findings reveal the patterns of biases are often unpredictable and underscore the necessity of accounting for confounding factors when evaluating TBPs.
 翻译:暂无翻译