A prediction model is most useful if it generalizes beyond the development data with external validations, but to what extent should it generalize remains unclear. In practice, prediction models are externally validated using data from very different settings, including populations from other health systems or countries, with predictably poor results. This may not be a fair reflection of the performance of the model which was designed for a specific target population or setting, and may be stretching the expected model generalizability. To address this, we suggest to externally validate a model using new data from the target population to ensure clear implications of validation performance on model reliability, whereas model generalizability to broader settings should be carefully investigated during model development instead of explored post-hoc. Based on this perspective, we propose a roadmap that facilitates the development and application of reliable, fair, and trustworthy artificial intelligence prediction models.
翻译:一个预测模型只有在外部验证中产生了推广应用才具有最大的价值,但它在多大程度上能够推广应该是不清晰的。实际上,预测模型使用来自有别于开发性数据的数据进行外部验证,包括来自其他卫生系统或国家的人群,结果预测性能差。这可能并不是该模型性能的真实反映,因为该模型是针对特定目标人群或设置而设计的,而可能超出了预期的模型通用性。为了解决这个问题,我们建议使用来自目标人群的新数据对模型进行外部验证,以确保验证绩效对模型可靠性的清晰影响,而模型对较广泛设置的通用性应在模型开发期间进行仔细研究,而不是在事后尝试探索。基于这一观点,我们提出一条路线,以便开发和应用可靠、公平、值得信赖的人工智能预测模型。