There is a long-standing debate in the statistical, epidemiological and econometric fields as to whether nonparametric estimation that uses data-adaptive methods, like machine learning algorithms in model fitting, confer any meaningful advantage over simpler, parametric approaches in real-world, finite sample estimation of causal effects. We address the question: when trying to estimate the effect of a treatment on an outcome, across a universe of reasonable data distributions, how much does the choice of nonparametric vs.~parametric estimation matter? Instead of answering this question with simulations that reflect a few chosen data scenarios, we propose a novel approach evaluating performance across thousands of data-generating mechanisms drawn from non-parametric models with semi-informative priors. We call this approach a Universal Monte-Carlo Simulation. We compare performance of estimating the average treatment effect across two parametric estimators (a g-computation estimator that uses a parametric outcome model and an inverse probability of treatment weighted estimator) and two nonparametric estimators (Bayesian additive regression trees and a targeted minimum loss-based estimator that uses an ensemble of machine learning algorithms in model fitting). We summarize estimator performance in terms of bias, confidence interval coverage, and mean squared error. We find that the nonparametric estimators nearly always outperform the parametric estimators with the exception of having similar performance in terms of bias and similar-to-slightly-worse performance in terms of coverage under the smallest sample size of N=100.
翻译:统计、流行病学和计量经济学领域存在长期争论,即使用数据适应方法的非参数估计,如模型安装中的机器学习算法,是否给现实世界中较简单、准参数的因果关系抽样估计带来任何有意义的优势。我们处理的问题是:在试图估计治疗对结果的影响时,在合理数据分布的宇宙中,如何选择非参数对参数对参数估测问题?我们建议采用新的方法,而不是用模拟来回答这一问题,以反映少数选定的数据假设,而是评价从非参数模型中得出数千个数据生成机制的性能。我们将此方法称为通用蒙特-卡尔模拟。我们比较了两个参数估计结果对结果的影响(一个计算估计估计标准,如何选择非参数对参数对参数对参数对参数的偏差)和两个非参数的不精确估测结果(Bayeyan Regard Reformactical Reformormation)的性能。我们比较了两个参数的性能估计值平均处理效果的性能,我们用最起码的模型估测度模型对数值,我们用最起码的性估测度估测度的不值,我们用最差的估测的性估测标准的模型,我们用一个性估测测测测测测测的性能的性能的模型的性能的性能的性能的性能测测测测测测测测测算。