Participants in online experiments often enroll over time, which can compromise sample representativeness due to temporal shifts in covariates. This issue is particularly critical in A/B tests, online controlled experiments extensively used to evaluate product updates, since these tests are cost-sensitive and typically short in duration. We propose a novel framework that dynamically assesses sample representativeness by dividing the ongoing sampling process into three stages. We then develop stage-specific estimators for Population Average Treatment Effects (PATE), ensuring that experimental results remain generalizable across varying experiment durations. Leveraging survival analysis, we develop a heuristic function that identifies these stages without requiring prior knowledge of population or sample characteristics, thereby keeping implementation costs low. Our approach bridges the gap between experimental findings and real-world applicability, enabling product decisions to be based on evidence that accurately represents the broader target population. We validate the effectiveness of our framework on three levels: (1) through a real-world online experiment conducted on WeChat; (2) via a synthetic experiment; and (3) by applying it to 600 A/B tests on WeChat in a platform-wide application. Additionally, we provide practical guidelines for practitioners to implement our method in real-world settings.
翻译:暂无翻译