We consider the problem of variance reduction in randomized controlled trials, through the use of covariates correlated with the outcome but independent of the treatment. We propose a machine learning regression-adjusted treatment effect estimator, which we call MLRATE. MLRATE uses machine learning predictors of the outcome to reduce estimator variance. It employs cross-fitting to avoid overfitting biases, and we prove consistency and asymptotic normality under general conditions. MLRATE is robust to poor predictions from the machine learning step: if the predictions are uncorrelated with the outcomes, the estimator performs asymptotically no worse than the standard difference-in-means estimator, while if predictions are highly correlated with outcomes, the efficiency gains are large. In A/A tests, for a set of 48 outcome metrics commonly monitored in Facebook experiments the estimator has over 70\% lower variance than the simple difference-in-means estimator, and about 19\% lower variance than the common univariate procedure which adjusts only for pre-experiment values of the outcome.
翻译:通过使用与结果相关但与治疗无关的共变法,我们考虑随机控制试验的差异减少问题。我们提出一个机器学习回归调整处理效果估计仪,我们称之为MLRATE。MLRATE使用结果的机器学习预测仪来减少估计值差异。MLRATE使用对结果的机器学习预测仪来减少估计值差异。它使用交叉配置来避免偏差过大,在一般条件下,我们证明是一致性和无症状的正常性。MLRATE对机器学习步骤的错误预测非常强大:如果预测与结果不相关,那么估计器的随机调整后处理效果估计值不会比标准手段差异估计值差得多,而如果预测与结果高度相关,效率增益是巨大的。在A/A测试中,一套在Facebook实验中经常监测的48项结果衡量标准中,估计器的差额比简单的语言差异估计器低70 ⁇ 以上,比普通的单项程序低19 ⁇ 小。