The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known. Despite its practical relevance for many applications, few works studied it for unknown variances. In this paper we introduce and analyze two approaches to deal with unknown variances, either by plugging in the empirical variance or by adapting the transportation costs. In order to calibrate our two stopping rules, we derive new time-uniform concentration inequalities, which are of independent interest. Then, we illustrate the theoretical and empirical performances of our two sampling rule wrappers on Track-and-Stop and on a Top Two algorithm. Moreover, by quantifying the impact on the sample complexity of not knowing the variances, we reveal that it is rather small.
翻译:当差异为人所知时,人们就非常了解如何确定拥有高山奖赏分配的集合项目中的最佳部分。尽管它对于许多应用程序具有实际意义,但很少有人研究它,因为差异程度不详。在本文中,我们提出和分析两种方法来应对未知差异,要么填补经验差异,要么调整运输成本。为了校正我们两个停止规则,我们产生了新的时间-统一集中不平等,这是独立感兴趣的。然后,我们举例说明了我们在轨迹和停止以及顶层二级算法上的两个抽样规则包件的理论和经验表现。此外,通过量化不知道差异对抽样复杂性的影响,我们发现它相当小。