We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only asymptotically optimal: we also prove non-asymptotic bounds occurring with high probability. To the best of our knowledge, this is the first strategy with such guarantees. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is slightly biased in favor of exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.
翻译:我们提出了一个新战略,以固定的信心,对高山变量进行最佳武器识别,并有封闭手段和单位差异。这个战略称为“勘探-比亚抽样抽样调查”,不仅无处不在,而且非常理想:我们还证明,非非非抽样界限发生的可能性很大。据我们所知,这是第一个有这种保证的战略。但相对于其他算法,例如“追踪和停止”的主要优势是改进了勘探行为:探索-比亚抽样抽样调查略有偏向于以微妙但自然的方式进行勘探,从而使得其更加稳定和易于解释。通过对抽样复杂性优化问题进行新的分析,这些改进是允许的,因为通过分析可以产生一个更快的数字解决方案和一些我们认为具有高度独立利益的定量定期性结果。