Randomized experiments (a.k.a. A/B tests) are a powerful tool for estimating treatment effects, to inform decisions making in business, healthcare and other applications. In many problems, the treatment has a lasting effect that evolves over time. A limitation with randomized experiments is that they do not easily extend to measure long-term effects, since running long experiments is time-consuming and expensive. In this paper, we take a reinforcement learning (RL) approach that estimates the average reward in a Markov process. Motivated by real-world scenarios where the observed state transition is nonstationary, we develop a new algorithm for a class of nonstationary problems, and demonstrate promising results in two synthetic datasets and one online store dataset.
翻译:随机实验(a.k.a.A/B测试)是估算治疗效果的有力工具,为商业、保健和其他应用的决策提供信息。在许多问题上,治疗具有长期的、随时间演变的影响。随机实验的限制是,由于长期试验耗时耗时且费用昂贵,因此不易推广到测算长期影响。在本文中,我们采用了一种强化学习(RL)方法,估计Markov过程中的平均回报。在现实世界中,观察到的状态过渡是非静止的,我们受这种情景的驱动,我们为非静止问题类开发了一种新的算法,并在两个合成数据集和一个在线储存数据集中展示了有希望的结果。