We consider the fixed-budget best arm identification problem where the goal is to find the arm of the largest mean with a fixed number of samples. It is known that the probability of misidentifying the best arm is exponentially small to the number of rounds. However, limited characterizations have been discussed on the rate (exponent) of this value. In this paper, we characterize the optimal rate as a result of global optimization over all possible parameters. We introduce two rates, $R^{\mathrm{go}}$ and $R^{\mathrm{go}}_{\infty}$, corresponding to lower bounds on the misidentification probability, each of which is associated with a proposed algorithm. The rate $R^{\mathrm{go}}$ is associated with $R^{\mathrm{go}}$-tracking, which can be efficiently implemented by a neural network and is shown to outperform existing algorithms. However, this rate requires a nontrivial condition to be achievable. To deal with this issue, we introduce the second rate $R^{\mathrm{go}}_\infty$. We show that this rate is indeed achievable by introducing a conceptual algorithm called delayed optimal tracking (DOT).
翻译:我们考虑的是固定预算最佳手臂识别问题,目标是找到最大平均值的手臂,并有固定数量的样本。已知误认最佳手臂的可能性极小,与弹道数量相比是成倍的。然而,对这一数值的速率(用量)进行了有限的定性讨论。在本文中,我们根据全球优化对所有可能的参数的最佳速率进行了定性。我们引入了两种汇率,即美元和美元,相当于误辨概率的下限,每种误辨概率都与提议的算法有关。美元-马特尔姆{戈涅因菲特元与美元-跟踪有关,可以通过神经网络高效地实施,并显示超过现有的算法。然而,这一比率需要一种非边际条件才能实现。为了解决这个问题,我们引入了第二种汇率 美元-马特尔姆{戈涅因菲特$。我们表明,采用一种称为“最佳跟踪”的概念性算法确实可以实现这一速率。