Personalized medicine constitutes a growing area of research that benefits from the many new developments in statistical learning. A key domain concerns the prediction of individualized treatment effects, and models for this purpose are increasingly common in the published literature. Aiming to facilitate the validation of prediction models for individualized treatment effects, we extend the classical concepts of discrimination and calibration performance to assess causal (rather than associative) prediction models. Working within the potential outcomes framework, we first evaluate properties of existing statistics (including the c-for-benefit) and subsequently propose novel model-based statistics. The main focus is on randomized trials with binary endpoints. We use simulated data to provide insight into the characteristics of discrimination and calibration statistics, and further illustrate all methods in a trial in acute ischemic stroke treatment. Results demonstrate that the proposed model-based statistics had the best characteristics in terms of bias and variance. While resampling methods to adjust for optimism of performance estimates in the development data were effective on average, they had a high variance across replications that limits their accuracy in any particular applied analysis. Thereto, individualized treatment effect models are best validated in external data rather than in the original development sample.
翻译:个人医学是一个日益扩大的研究领域,从统计学习的许多新发展中受益。一个关键领域涉及个人化治疗效果的预测,为此而开发的模式在出版的文献中日益普遍。为了便利个人化治疗效果预测模型的验证,我们扩展了典型的歧视和校准表现概念,以评估因果关系(而不是关联性)预测模型。在潜在成果框架内,我们首先评估现有统计数据的特性(包括效益),然后提出新的基于模型的统计数据。主要重点是使用二元端点的随机化试验。我们使用模拟数据来深入了解歧视和校准统计数据的特点,并进一步说明急性中风治疗试验中采用的所有方法。结果显示,拟议的基于模型的统计数据在偏差和差异方面具有最佳特征。虽然调整发展数据业绩估计的乐观性的方法在平均有效,但在限制任何特定应用分析的准确性的复制方面差异很大。因此,个人化治疗效果模型在外部数据中比在原始发展抽样中得到最佳验证。