Clinical prediction models estimate an individual's risk of a particular health outcome, conditional on their values of multiple predictors. A developed model is a consequence of the development dataset and the chosen model building strategy, including the sample size, number of predictors and analysis method (e.g., regression or machine learning). Here, we raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Then, through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers should always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model building steps (those used in the development of the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and then deriving (i) a prediction instability plot of bootstrap model predictions (y-axis) versus original model predictions (x-axis), (ii) a calibration instability plot showing calibration curves for the bootstrap models in the original sample; and (iii) the instability index, which is the mean absolute difference between individuals' original and bootstrap model predictions. A case study is used to illustrate how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), whilst also informing a model's critical appraisal (risk of bias rating), fairness assessment and further validation requirements.
翻译:临床预测模型 估计个人特定健康结果的风险, 以其多个预测值为条件。 发达模型是发展数据集和选定模型建设战略的结果, 包括抽样规模、预测数和分析方法( 如回归或机器学习 ) 。 这里, 我们提出这样的关切, 许多模型是使用小型数据集开发的, 导致模型及其预测( 估计风险) 不稳定。 我们定义了从整体平均值到个人水平的估计风险的四级模型稳定性。 然后, 通过统计和机器学习方法的模拟和案例研究, 我们显示了模型估计风险的稳定性往往相当大, 最终表现为新数据预测的误差。 因此, 我们建议研究人员应始终检查模型开发阶段的不稳定性, 并提出不稳定性图案和措施。 这意味着要重复模型构建步骤( 用于原始预测模型模型的模型 ) 从总体平均值到原始指数( 1000) 靴陷阱评估样本, 以产生多重的帮助模型模型, 并且随后得出( ) 靴子的估算值的准确性预估测模型和原始模型的精确性预测( 轴 ) 模型的精确性模型的精确性模型的校正值预测( ) 是原始模型的精确的校正的模型的精确的校正( ) 模型的校正( ) 的模型的模型的校正的模型的模型的校正的模型的推的模型的模型的模型的模型的模型的精确的推的精确的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的推的