Bayesian bandit algorithms with approximate inference have been widely used in practice with superior performance. Yet, few studies regarding the fundamental understanding of their performances are available. In this paper, we propose a Bayesian bandit algorithm, which we call Generalized Bayesian Upper Confidence Bound (GBUCB), for bandit problems in the presence of approximate inference. Our theoretical analysis demonstrates that in Bernoulli multi-armed bandit, GBUCB can achieve $O(\sqrt{T}(\log T)^c)$ frequentist regret if the inference error measured by symmetrized Kullback-Leibler divergence is controllable. This analysis relies on a novel sensitivity analysis for quantile shifts with respect to inference errors. To our best knowledge, our work provides the first theoretical regret bound that is better than $o(T)$ in the setting of approximate inference. Our experimental evaluations on multiple approximate inference settings corroborate our theory, showing that our GBUCB is consistently superior to BUCB and Thompson sampling.
 翻译:我们的理论分析表明,在伯努利的多武装匪帮中,GBUCB可以达到$O(sqrt{T}(log T)c),但经常者对通过对 Kullback-Leibeller 差异进行平衡测量的推断错误是可以控制的感到遗憾。我们在本文件中建议采用一种巴伊西亚土匪算法(我们称之为GBUCBC),即通用的Bayesian Up Incure Bound(GBUCBCB),用于在大致推理情况下解决土匪问题。我们的理论分析表明,我们的GBUBCB始终比BB和Thompson抽样高。