The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis that has recently received considerable attention in the literature. Existing results relate in precise ways the sign and magnitude of the bias to various sources of data adaptivity, but do not apply to the conditional inference setting in which the sample means are computed only if some specific conditions are satisfied. In this paper, we characterize the sign of the conditional bias of monotone functions of the rewards, including the sample mean. Our results hold for arbitrary conditioning events and leverage natural monotonicity properties of the data collection policy. We further demonstrate, through several examples from sequential testing and best arm identification, that the sign of the conditional and marginal bias of the sample mean of an arm can be different, depending on the conditioning event. Our analysis offers new and interesting perspectives on the subtleties of assessing the bias in data adaptive settings.
翻译:多武装匪徒中武器抽样手段的偏差是适应性数据分析中的一个重要问题,最近文献中相当关注这个问题。现有结果以精确的方式涉及对数据适应性各种来源的偏差的标志和程度,但不适用于只有在满足某些特定条件的情况下才计算抽样手段的有条件推论环境。在本文中,我们描述奖励的单调功能的有条件偏差的标志,包括样本平均值。我们的结果是任意调节事件,利用数据收集政策的自然单调性特性。我们通过一系列连续测试和最佳手臂识别的一些例子进一步表明,一个手臂抽样平均值的有条件和边缘偏差的标志可能因调节事件而不同。我们的分析对评估数据适应环境中的偏差的微妙性提出了新的和有趣的观点。