The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of \emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information-theoretic divergences with new martingale concentration inequalities.
翻译:样本平均值是统计中研究最周密的估算者之一,具有许多可取的属性,如公正性和一致性。然而,在分析使用多武装土匪实验(MAB)收集的数据时,样本平均值是有偏向的,对其属性仍有许多有待理解之处。例如,当样本平均值一致时,其偏向有多大,我们能否约束其平均的方差?本文对MAB样本的偏向、风险和一致性进行了彻底和系统的处理。具体地说,我们找出了选择偏向的四个不同来源(抽样、停止、选择和倒转),并分别和一起分析。我们进一步证明,可以使用新的\emph{有效样本大小的观念来约束样本在适当损失功能下的风险。我们提出了一些精心设计的例子,以提供我们研究的不同选择偏差来源的直观性。我们的处理是非对称和算法的,意思是它与特定的算法或目标没有联系。在螺形结构上,我们的证据将信息- 差异的描述与新的Martingale浓度不平等结合起来。