Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects' predicted outcomes (their prognostic scores). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer's Disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance.
翻译:估计随机实验的因果关系是临床研究的核心。 减少这些分析中的统计不确定性是统计学家的一个重要目标。 登记、 先前的试验和健康记录构成了越来越多的可用于此目的的关于护理标准下病人的历史数据简编。 但是,大多数历史借款方法通过牺牲严格的类型一误差率控制来减少差异。 这里, 我们提议使用历史数据, 利用线性共变调整来提高试验分析效率, 而不会产生偏差。 具体地说, 我们用一个预测性模型来分析历史数据, 然后用线性回归来估计治疗效果, 同时调整试验对象预测结果( 预测性分数 ) 。 我们证明, 在某些条件下, 这种预测性共变差调整程序在大型估误差者中达到了最小的差异。 如果这些条件没有得到满足, 预测性复变调整仍然比原始变差调整效率更高。 效率的提高与衡量上述预测性模型的预测性精确度的准确度值相当, 然后在试验结果的精确性分析中, 我们用一个原始的模型来解释, 模拟和直线性结果的计算结果的计算, 我们用一个精确的计算方法来解释。