Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "pooling" data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; we show that this can cause standard variance estimators for i.i.d. data to underestimate the true variance of common estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the \textit{adaptive} sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, to prove our results we develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest. This work is motivated by our efforts in designing experiments in which online reinforcement learning algorithms optimize treatment decisions, yet statistical inference is essential for conducting analyses after experiments conclude.
翻译:在线强化学习和其他适应性抽样算法越来越多地用于数字干预实验,以优化用户在一段时间内的治疗提供。在这项工作中,我们侧重于由一大批适应性抽样算法收集的纵向用户数据,这些算法旨在利用来自多个用户的累积数据优化在线处理决定。合并或“汇总”数据使不同用户的适应性抽样算法有可能更快地学习。然而,这些算法通过汇集,促使抽样用户数据轨迹之间产生依赖性;我们表明,这可以导致标准差异估计器对i.d.数据产生标准差异估计器,从而低估这一数据类型的共同估测器的真正差异。我们开发了新颖的方法,对通过Z-估计对此类适应性抽样数据进行各种统计分析。具体地说,我们引入了\textit{adopti}混合差异估计算法,一个经过修正的三明治测算法,导致在适应性采样中得出一致的估计数。此外,为了证明我们的成果,我们为非.i.d.d.d.数据的经验过程开发了新的理论工具,适应性抽样测算数据可能具有独立兴趣的跨度数据。我们在进行统计性分析后,在进行这种分析时,这种研究之后,这种研究是进行必要的研究后,这是在进行必要的研究后,在进行必要的研究后,这是在进行必要的实验研究后,在进行必要的研究后,在进行必要的实验研究后,这是在进行必要的研究后进行必要的研究。