与M-Esti测算器对强盗数据的统计推论 (Statistical Inference with M-Estimators on Bandit Data)

Bandit algorithms are increasingly used in real world sequential decision making problems, from online advertising to mobile health. As a result, there are more datasets collected using bandit algorithms and with that an increased desire to be able to use these datasets to answer scientific questions like: Did one type of ad increase the click-through rate more or lead to more purchases? In which contexts is a mobile health intervention effective? However, it has been shown that classical statistical approaches, like those based on the ordinary least squares estimator, fail to provide reliable confidence intervals when used with bandit data. Recently methods have been developed to conduct statistical inference using simple models fit to data collected with multi-armed bandits. However there is a lack of general methods for conducting statistical inference using more complex models. In this work, we develop theory justifying the use of M-estimation (Van der Vaart, 2000), traditionally used with i.i.d data, to provide inferential methods for a large class of estimators -- including least squares and maximum likelihood estimators -- but now with data collected with (contextual) bandit algorithms. To do this we generalize the use of adaptive weights pioneered by Hadad et al. (2019) and Deshpande et al. (2018). Specifically, in settings in which the data is collected via a (contextual) bandit algorithm, we prove that certain adaptively weighted M-estimators are uniformly asymptotically normal and demonstrate empirically that we can use their asymptotic distribution to construct reliable confidence regions for a variety of inferential targets.

翻译：在现实世界中,从在线广告到移动健康等一系列决策问题越来越多地使用盗匪算法。结果,利用盗匪算法收集到的数据集越来越多,因此人们越来越希望能够使用这些数据集来回答科学问题,例如: 一种类型的广告是否提高了点击通速率或导致更多的采购? 在什么情况下,移动健康干预是有效的? 然而,已经表明,传统的统计方法,如基于普通最小方位估测器的方法,在使用土匪数据时,无法提供可靠的信心间隔。最近已经开发了方法,利用适合多武装匪徒所收集数据的简单模型进行统计推断。然而,缺乏使用更复杂的模型进行统计推论的一般方法。在这项工作中,我们开发了使用M-估计(Van der Vaart,2000年)的理论,传统上使用i.d数据,以便为大量估算器提供推断方法 -- 包括最小方位和最大可能性估测器 -- 现在,我们收集的数据是(平流的)平流数据,用于(平面的平流数据,我们用这个直径的平级算法中,我们用一个直径直径直的平的平的平的比值数据, 将数据作为正正的平级算法的平级数据作为直径直径平平平的平平的平平的平的平的平的平平平的平的平的平的平的平比。