Due to concerns about parametric model misspecification, there is interest in using machine learning to adjust for confounding when evaluating the causal effect of an exposure on an outcome. Unfortunately, exposure effect estimators that rely on machine learning predictions are generally subject to so-called plug-in bias, which can render naive p-values and confidence intervals invalid. Progress has been made via proposals like targeted maximum likelihood estimation and more recently double machine learning, which rely on learning the conditional mean of both the outcome and exposure. Valid inference can then be obtained so long as both predictions converge (sufficiently fast) to the truth. Focusing on partially linear regression models, we show that a specific implementation of the machine learning techniques can yield exposure effect estimators that have small bias even when one of the first-stage predictions does not converge to the truth. The resulting tests and confidence intervals are doubly robust. We also show that the proposed estimators may fail to be regular when only one nuisance parameter is consistently estimated; nevertheless, we observe in simulation studies that our proposal leads to reduced bias and improved confidence interval coverage in moderate samples.
翻译:由于对参数模型偏差的担忧,人们有兴趣利用机器学习来调整,以便在评价接触结果的因果关系时加以混淆。不幸的是,依赖机器学习预测的接触效应估计者一般会受到所谓的插头偏差,这可能使天真P价值和信任间隔无效。通过有针对性地进行最大可能性估计和最近的双机学习等建议取得了进展,这些提议依靠的是了解结果和暴露的有条件平均值。然后,只要两种预测都(足够快)接近(并相当快)到真理,就可以取得有效的推断。我们注重部分线性回归模型,我们表明,机器学习技术的具体实施可以产生很小偏差的接触效果估计者,即使第一阶段预测之一没有与真理趋同。由此产生的测试和信任间隔加倍。我们还表明,在始终不断地估计一个干扰参数时,拟议的估计者可能无法正常;然而,我们在模拟研究中标本中标本中,我们的建议会导致减少偏差,改进信任间隔范围。