Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this paper, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.
翻译:大型实验和观测数据的估测性因果效应在工业和研究中日益普遍。测深器是一种直觉和强大的技术,用于构建标准误差和测深器信任间隔。但是,在涉及大量数据的环境下,其应用可能要求过高。此外,基于机器学习和优化技术的现代因果测算器加剧了测深器的计算负担。在大数据的非因果环境中提出了小靴陷阱包,但尚未用于评估估计因果关系的特性。在本文中,我们引入了一种新的测深器算法,称为因果小靴圈包,用大量数据推断因果关系。新的算法大大提高了传统测测深塔的计算效率,同时提供了一致的估计和理想的信任间隔范围。我们描述了其特性,提供了实际考虑,并评估了拟议算法在偏差、真正95%信任期的覆盖面和模拟研究中的计算时间方面的表现。我们运用了它来评估内激素疗法对平均时间对内心血管疾病的影响。我们用大量数据来评估内心细胞病观察。