通过使用堆叠估算法综合摘要水平数据,对多个人口群进行递减推断 (Regression inference for multiple populations by integrating summary-level data using stacked imputations)

There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. This paper proposes an imputation-based methodology where the goal is to fit an outcome regression model with all available variables in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each population, uses stacked multiple imputation to create a long dataset with complete covariate information, and finally analyzes the imputed data with weighted regression. This flexible and unified approach attains the following four objectives: (i) incorporating supplementary information from a broad class of externally fitted predictive models or established risk calculators which could be based on parametric regression or machine learning methods, as long as the external model can generate outcome values given covariates; (ii) improving statistical efficiency of the estimated coefficients in the internal study; (iii) improving predictions by utilizing even partial information available from models that uses a subset of the full set of covariates used in the internal study; and (iv) providing valid statistical inference for the external population with potentially different covariate effects from the internal population. Applications include prostate cancer risk prediction models using novel biomarkers that are measured only in the internal study.

翻译：日益需要灵活的一般框架,将个人层面的数据与外部汇总信息结合起来,以便改进统计推导。本文件提议了一种基于估算的方法,目的是将结果回归模型与内部研究中所有现有变量相匹配,同时利用可能只使用预测器子子集的外部模型的汇总信息。这种方法允许外部人口群之间共变效应的异质性。拟议方法在每个人口群中生成合成结果数据,使用堆叠式多重估算来创建包含完整共变信息的长数据集,并最终分析加权回归的估算数据。这一灵活和统一的方法达到了以下四个目标:(一) 纳入来自外部安装的全套预测模型或既定风险计算器的广泛类别的补充信息,这些信息可以基于参数回归或机体学习方法,只要外部模型能够产生结果值给定共变异性;(二) 提高内部研究中估计系数的统计效率;(三) 改进预测,甚至利用从模型中获得的部分信息,使用全套测算式的共变式数据。这一灵活统一方法实现了以下四个目标:(一) 将外部预测模型或既定风险纳入内部预测中,其中,在内部研究中提供可计量的预测结果。