We study the optimal variance reduction solutions for online controlled experiments by applying flexible machine learning tools to incorporate covariates that are independent from the treatment but have predictive power for the outcomes. Employing cross-fitting, we propose variance reduction procedures for both count metrics and ratio metrics in online experiments based on which the inference of the estimands are valid under mild convergence conditions. We also establish the asymptotic optimality of all these procedures under consistency condition of the machine learning estimators. In complement to the proposed nonlinear optimal procedure, a linear adjustment method for ratio metrics is also derived as a special case that is computationally efficient and can flexibly incorporate any pre-treatment covariates. Comprehensive simulation studies are performed and practical suggestions are given. When tested on real online experiment data from LinkedIn, the proposed optimal procedure for ratio metrics can reduce up to $80\%$ of variance compared to the standard difference-in-mean estimator and also further reduce up to $30\%$ of variance compared to the CUPED approach by going beyond linearity and incorporating a large number of extra covariates.
翻译:我们通过运用灵活的机器学习工具,研究网上控制实验的最佳差异减少办法,以纳入独立于治疗但具有预测结果能力的共变体。我们采用交叉校准,提出在线实验中计数度量和比率衡量标准的差异减少程序,根据这种程序,根据温和的趋同条件,估计天花板的推论是有效的。我们还在机器学习估测器的一致性条件下,确定所有这些程序的无症状最佳性。作为对拟议的非线性最佳程序的补充,还得出了比率衡量标准的线性调整方法,作为计算效率高、可灵活地纳入任何预处理变体的特殊案例。进行了综合模拟研究并提出了实际建议。在对LinkedIn公司的实际在线实验数据进行测试时,拟议的最佳比率衡量程序可比标准差异估计器减少80美元,并通过超越直线性和纳入大量额外共变数,进一步减少与CUPED方法的差异30美元。