We present methods for estimating loss-based measures of the performance of a prediction model in a target population that differs from the source population in which the model was developed, in settings where outcome and covariate data are available from the source population but only covariate data are available on a simple random sample from the target population. Prior work adjusting for differences between the two populations has used various weighting estimators with inverse odds or density ratio weights. Here, we develop more robust estimators for the target population risk (expected loss) that can be used with data-adaptive (e.g., machine learning-based) estimation of nuisance parameters. We examine the large-sample properties of the estimators and evaluate finite sample performance in simulations. Last, we apply the methods to data from lung cancer screening using nationally representative data from the National Health and Nutrition Examination Survey (NHANES) and extend our methods to account for the complex survey design of the NHANES.
翻译:在从源人口获得结果和共变数据但只有从目标人口获得简单随机抽样的环境下,我们提出对预测模型绩效进行基于损失的衡量方法,这种预测模型与制定模型时的源人口不同,在从源人口获得结果和共变数据但只有从目标人口获得共同变数数据的情况下,对预测模型绩效进行基于损失的衡量; 先前对两种人口之间的差异进行调整时,使用了各种具有反差或密度比重的加权估测器; 在这里,我们为目标人口风险(预期损失)开发了更强有力的估算器,可用于数据适应性(例如机器学习)的骚扰参数估计; 我们审查了估计者的大量特性,并评估模拟中的有限抽样性能; 最后,我们使用国家健康和营养调查(NHANES)中具有代表性的数据,对肺癌筛查数据应用了方法,并将我们的方法扩大到对NHANES的复杂调查设计进行核算。