We consider $K$-armed stochastic bandits and consider cumulative regret bounds up to time $T$. We are interested in strategies achieving simultaneously a distribution-free regret bound of optimal order $\sqrt{KT}$ and a distribution-dependent regret that is asymptotically optimal, that is, matching the $\kappa\ln T$ lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996), where $\kappa$ is the optimal problem-dependent constant. This constant $\kappa$ depends on the model $\mathcal{D}$ considered (the family of possible distributions over the arms). M\'enard and Garivier (2017) provided strategies achieving such a bi-optimality in the parametric case of models given by one-dimensional exponential families, while Lattimore (2016, 2018) did so for the family of (sub)Gaussian distributions with variance less than $1$. We extend this result to the non-parametric case of all distributions over $[0,1]$. We do so by combining the MOSS strategy by Audibert and Bubeck (2009), which enjoys a distribution-free regret bound of optimal order $\sqrt{KT}$, and the KL-UCB strategy by Capp\'e et al. (2013), for which we provide in passing the first analysis of an optimal distribution-dependent $\kappa\ln T$ regret bound in the model of all distributions over $[0,1]$. We were able to obtain this non-parametric bi-optimality result while working hard to streamline the proofs (of previously known regret bounds and thus of the new analyses carried out); a second merit of the present contribution is therefore to provide a review of proofs of classical regret bounds for index-based strategies for $K$-armed stochastic bandits.
翻译:我们考虑的是1K美元武装突击匪徒,并考虑的是累积式遗憾,直到时间为止$T美元。我们感兴趣的战略是,同时实现一个无分配的遗憾,最优的顺序是$\sqrt{KT}$(美元)和依赖分配的遗憾,在一维指数家庭给出的模型参数上达到双优化,而Lattimore(2016年)和Burnetas和Katehakis(1996年)的低额约束是拉伊和Robbins(1985年)、Burnetas和Burnetas(美元),因为Gausta(美元)的第二等级分配是最佳的基数不变的常数。这个常数$(美元)的常数取决于所考虑的模型$\mathcal{D}硬值(武器上可能分配的组合)。 M\enard和Garivier(2017年) 提供了这种战略实现双优化的策略,而Ltimal(美元) 最优的汇率分配是目前(美元)的基数(美元)的基数(美元)的第二级分配结果,我们从最优级分析中获取到最优级(美元) 最优的基级分配战略,这是最优的基级战略。我们所知道的平级分配的基级战略, 最优的基级战略提供了最佳的平级)的基级分配。