Risk models are becoming ubiquitous in healthcare and may guide intervention by providing practitioners with insights from patient data. Should a model be updated after a guided intervention, it may lead to its own failure at making predictions. The use of a `holdout set' -- a subset of the population that does not receive interventions guided by the model -- has been proposed to prevent this. Since patients in the holdout set do not benefit from risk predictions, the chosen size must trade off maximising model performance whilst minimising the number of held out patients. By defining a general loss function, we prove the existence and uniqueness of an optimal holdout set size, and introduce parametric and semi-parametric algorithms for its estimation. We demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented solution to the model update problem.
翻译:风险模型在医疗保健方面正在变得无处不在,并且可能通过向从业者提供病人数据的洞察力来指导干预。如果在有指导的干预之后更新模型,它可能导致自己在预测方面失败。为了预防这种情况,提议使用“固定套件”——这是得不到模型指导的干预的人群中的一部分人——以防止这种情况发生。由于坚持套件的病人不能从风险预测中受益,所选择的尺寸必须用来交换最大程度的模型性能,同时尽量减少被扣留的病人的人数。通过界定一般损失功能,我们证明存在最佳的固定套件尺寸,并采用准参数和半参数算法进行估计。我们用这些算法来证明这些套套件在最近一个风险分数上用于预种植前的风险分数。根据这些结果,我们认为,坚持套件是模型更新问题的一种安全、可行和易于执行的解决办法。