Regression adjustment, sometimes known as Controlled-experiment Using Pre-Experiment Data (CUPED), is an important technique in internet experimentation. It decreases the variance of effect size estimates, often cutting confidence interval widths in half or more while never making them worse. It does so by carefully regressing the goal metric against pre-experiment features to reduce the variance. The tremendous gains of regression adjustment begs the question: How much better can we do by engineering better features from pre-experiment data, for example by using machine learning techniques or synthetic controls? Could we even reduce the variance in our effect sizes arbitrarily close to zero with the right predictors? Unfortunately, our answer is negative. A simple form of regression adjustment, which uses just the pre-experiment values of the goal metric, captures most of the benefit. Specifically, under a mild assumption that observations closer in time are easier to predict that ones further away in time, we upper bound the potential gains of more sophisticated feature engineering, with respect to the gains of this simple form of regression adjustment. The maximum reduction in variance is $50\%$ in Theorem 1, or equivalently, the confidence interval width can be reduced by at most an additional $29\%$.
翻译:暂无翻译