用于固定预算最佳武器鉴定最佳武器鉴定的最优化算法 (Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification)

We consider the fixed-budget best arm identification problem where the goal is to find the arm of the largest mean with a fixed number of samples. It is known that the probability of misidentifying the best arm is exponentially small to the number of rounds. However, limited characterizations have been discussed on the rate (exponent) of this value. In this paper, we characterize the minimax optimal rate as a result of an optimization over all possible parameters. We introduce two rates, $R^{\mathrm{go}}$ and $R^{\mathrm{go}}_{\infty}$, corresponding to lower bounds on the probability of misidentification, each of which is associated with a proposed algorithm. The rate $R^{\mathrm{go}}$ is associated with $R^{\mathrm{go}}$-tracking, which can be efficiently implemented by a neural network and is shown to outperform existing algorithms. However, this rate requires a nontrivial condition to be achievable. To address this issue, we introduce the second rate $R^{\mathrm{go}}_\infty$. We show that this rate is indeed achievable by introducing a conceptual algorithm called delayed optimal tracking (DOT).

翻译：我们考虑的是固定预算最佳手臂识别问题,目标是找到最大平均值的手臂,并有固定数量的样本。众所周知,误认最佳手臂的可能性小于每发子弹的数量。然而,对这一数值的速率(耗用率)进行了有限的定性讨论。在本文中,我们将由于优化所有可能的参数而导致的最低最大最佳比率定性为优化所有可能的参数。我们引入了两种费率,即美元和美元,即美元,即美元和美元,即,美元,即,美元,即,美元,即,美元,即,美元,即,美元,即,美元,即,美元,即,美元,即,即,美元,即,即,美元,即,即,与拟议算法有关的误认概率较低。美元,即,美元,即:马特拉姆{戈尼,美元;美元,即美元,即美元,即美元,即美元,即美元,即,即美元,即美元,即美元,即美元,即美元,即美元;但是,这一比率需要一种非三维条件才能实现。为了解决这一问题,我们引入第二种费率,即,美元,美元,即,美元,即美元,即美元,美元,即美元,即美元,美元,即美元,即美元,即美元,即美元,即美元,即美元,即美元,美元,美元,美元,以内----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------