A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this paper is to introduce a reinforcement learning framework for carrying A/B testing in these experiments, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both simulated data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. A Python implementation of our test is available at https://github.com/callmespring/CausalRL.
翻译:A/B测试或在线实验是将新产品与制药、技术和传统行业的旧产品进行比较的标准商业战略,在双面市场平台(如Uber)的在线实验中出现重大挑战,因为那里只有一个单位长期接受一系列治疗。在这些实验中,特定时间的治疗既影响目前的结果,也影响未来的结果。本文件的目的是为在这些实验中进行A/B测试引入一个强化学习框架,同时说明长期治疗效果。我们提议的测试程序允许连续监测和在线更新。一般适用于不同行业的各种治疗设计。此外,我们系统地调查我们测试程序的理论属性(如规模和力量)。最后,我们运用我们的框架来模拟数据,以及从一家技术公司获得的一个真实世界数据实例来说明它在当前做法中的优势。我们测试的Python实施情况见https://github.com/callmespuring/CausalRL。