There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation technique to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population.
翻译:日益需要灵活的一般框架,将个人数据与外部简要信息整合起来,以便改进统计推导; 与风险预测模型相关的外部信息可能通过回归系数估计或结果变量的预测值以多种形式出现; 不同的外部模型可能使用不同的预测数组及其用来预测Y结果的算法,这些预测数可能或可能不为人所知; 与每个外部模型相对应的基本人口可能彼此不同,可能与内部研究人口不同; 受前列腺癌风险预测问题驱使,因为只有内部研究才计量新的生物标志,本文件建议采用基于估算方法,使目标与内部研究中所有现有预测数相适应的目标回归模型与所有现有预测数相一致,同时利用可能只使用预测数组的外部模型的汇总信息。 与每种外部模型相对应的复合结果数据可能不同,使用堆叠的多重估算性多种估算法来创建长期的数据集,同时提供完整的同异性信息; 对内部预测数组数据进行最后分析,同时利用加权的统计分析方法改进内部预测结果; 采用灵活和共同研究方法,改进内部预测结果; 采用统计分析方法,通过调整后算算法,改进内部数据; 采用统计分析,改进内部分析,采用统计分析,采用统计分析后算数组全面分析方法,改进内部数据。 采用统计分析,用统计分析,用统计分析方法,通过调整后算算法进行统计分析,改进内部分析,以调整后算算法进行统计基数数法,用。