Risk models in medical statistics and healthcare machine learning are increasingly used to guide clinical or other interventions. Should a model be updated after a guided intervention, it may lead to its own failure at making accurate predictions. The use of a `holdout set' -- a subset of the population that does not receive interventions guided by the model -- has been proposed to prevent this. Since patients in the holdout set do not benefit from risk predictions, the chosen size must trade off maximising model performance whilst minimising the number of held out patients. By defining a general loss function, we prove the existence and uniqueness of an optimal holdout set size, and introduce parametric and semi-parametric algorithms for its estimation. We demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented solution to the model update problem.
翻译:医疗统计和保健机学风险模型越来越多地被用于指导临床或其他干预措施。如果在有指导的干预后更新模型,则可能导致自身无法作出准确预测。为了预防这种情况,建议使用“停机套” -- -- 不受该模型指导的干预的一部分人口 -- -- 以防止这种情况发生。由于停机套中的病人没有从风险预测中受益,所选择的尺寸必须用来交换最大程度的模型性能,同时尽量减少被扣留的病人人数。通过界定一般损失功能,我们证明存在一个最佳的停机套和独特性,并采用参数和半参数算法进行估算。我们展示了这些“停机套”在前子宫内的最新风险分数中的使用情况。基于这些结果,我们认为,停机组是模型更新问题的一种安全、可行和易于实施的解决办法。