This paper focuses on best arm identification (BAI) in stochastic multi-armed bandits (MABs) in the fixed-confidence, parametric setting. In such pure exploration problems, the accuracy of the sampling strategy critically hinges on the sequential allocation of the sampling resources among the arms. The existing approaches to BAI address the following question: what is an optimal sampling strategy when we spend a $\beta$ fraction of the samples on the best arm? These approaches treat $\beta$ as a tunable parameter and offer efficient algorithms that ensure optimality up to selecting $\beta$, hence $\beta-$optimality. However, the BAI decisions and performance can be highly sensitive to the choice of $\beta$. This paper provides a BAI algorithm that is agnostic to $\beta$, dispensing with the need for tuning $\beta$, and specifies an optimal allocation strategy, including the optimal value of $\beta$. Furthermore, the existing relevant literature focuses on the family of exponential distributions. This paper considers a more general setting of any arbitrary family of distributions parameterized by their mean values (under mild regularity conditions).
翻译:本文侧重于固定信心和参数设置中精密多武装强盗(MABs)中最好的手臂识别(BAI),在这种纯粹的勘探问题中,抽样战略的准确性关键取决于武器之间抽样资源的顺序分配。BAI的现有办法解决了下列问题:当我们在最好的手臂上花费样品的一分钱一分钱时,最佳采样战略是什么?这些办法把$\Beta美元作为金枪鱼的参数,并提供有效的算法,确保最佳性地选择$\beta美元,也就是$\beta-obatity。然而,BAI的决定和性能对$\beta美元的选择可能非常敏感。本文提供了一种BAI算法的算法,该算法对美元和Beta美元具有敏感性,解决了调整$\beta美元的最佳分配战略,包括美元/beta美元的最佳价值。此外,现有的有关文献侧重于指数分布的家庭。本文认为,任何任意的分布范围都比较一般地按其平均值定出的分布参数。