Estimating the effects of long-term treatments through A/B testing is challenging. Treatments, such as updates to product functionalities, user interface designs, and recommendation algorithms, are intended to persist within the system for a long duration of time after their initial launches. However, due to the constraints of conducting long-term experiments, practitioners often rely on short-term experimental results to make product launch decisions. It remains open how to accurately estimate the effects of long-term treatments using short-term experimental data. To address this question, we introduce a longitudinal surrogate framework that decomposes the long-term effects into functions based on user attributes, short-term metrics, and treatment assignments. We outline identification assumptions, estimation strategies, inferential techniques, and validation methods under this framework. Empirically, we demonstrate that our approach outperforms existing solutions by using data from two real-world experiments, each involving more than a million users on WeChat, one of the world's largest social networking platforms.
翻译:暂无翻译