The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators.
翻译:在海量数据时代,为统计分析提供了越来越多的多种数据来源。我们考虑将海量主要数据与未经测量的混乱者相结合的因果效应估计,而将较小的验证数据与这些混乱者的补充信息结合起来。在与完全观察到的混乱者一起进行的无根据假设假设下,较小的验证数据可以构建一致的因果关系估计,但海量数据一般只能提供容易出错的估计数据。然而,通过以有原则的方式利用海量主要数据中的信息,我们可以提高估算效率,但只能根据验证数据来保存初始估计者的构成。我们的框架适用于非偶然的正常估计数据,包括常用的回归估计、加权和匹配估计者,并不要求正确规范与所观察到的变量有关的非计量者相连接的模型。我们还提出了适当的测靴程序,使我们使用软件常规执行现有估量者的方法更为简单。