We study optimal variance reduction solutions for count and ratio metrics in online controlled experiments. Our methods leverage flexible machine learning tools to incorporate covariates that are independent from the treatment but have predictive power for the outcomes, and employ the cross-fitting technique to remove the bias in complex machine learning models. We establish CLT-type asymptotic inference based on our estimators under mild convergence conditions. Our procedures are optimal (efficient) for the corresponding targets as long as the machine learning estimators are consistent, without any requirement for their convergence rates. In complement to the general optimal procedure, we also derive a linear adjustment method for ratio metrics as a special case that is computationally efficient and can flexibly incorporate any pre-treatment covariates. We evaluate the proposed variance reduction procedures with comprehensive simulation studies and provide practical suggestions regarding commonly adopted assumptions in computing ratio metrics. When tested on real online experiment data from LinkedIn, the proposed optimal procedure for ratio metrics can reduce up to 80\% of variance compared to the standard difference-in-mean estimator and also further reduce up to 30\% of variance compared to the CUPED approach by going beyond linearity and incorporating a large number of extra covariates.
翻译:我们的方法利用灵活的机器学习工具,将独立于治疗方法但能预测结果的共变体纳入其中,并采用交叉技术消除复杂机器学习模型中的偏差; 我们根据我们估算器在温和趋同条件下建立CLT型无症状推断法; 我们的程序对相应目标来说是最佳(效率),只要机器学习估计数字一致,不要求其趋同率; 作为一般最佳程序的补充,我们还为比率指标制定了一种线性调整方法,作为计算效率高并可灵活纳入任何预处理共变的特殊情况; 我们用全面的模拟研究来评价拟议的减少差异程序,并就计算比率衡量标准中普遍采用的假设提出切实可行的建议; 在测试LinkedIn的在线实际实验数据时,拟议的比率衡量最佳程序可以减少与标准平均估计率的差异80 ⁇,并进一步减少与CUPED方法相比的差异30 ⁇ 。