We consider Bayesian best arm identification in the multi-armed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization (Lai, 1987), the leading factor in Bayesian simple regret derives from the region where the gap between optimal and sub-optimal arms is smaller than $\sqrt{\frac{\log T}{T}}$. We propose a simple and easy-to-compute algorithm with its leading factor matches with the lower bound up to a constant factor; simulation results support our theoretical findings.
翻译:我们考虑的是多武装土匪问题中的巴伊西亚最好的手臂识别。 假设先前的某些连续条件, 我们确定巴伊西亚人的简单遗憾率。 不同于巴伊西亚人的遗憾最小化( Lai, 1987年), 巴伊西亚的简单遗憾率主要来自一个地区, 该地区最佳和次最佳武器之间的差距小于$\sqrt=frac=log T ⁇ T ⁇ Q ⁇ $。 我们提出一个简单和容易计算的算法, 其主要因素与较低因素相匹配, 与一个不变因素相匹配; 模拟结果支持我们的理论结论。