Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem. This represents the first rigorous regret analysis of ensemble sampling and is made possible by leveraging information-theoretic concepts and novel analytic techniques that may prove useful beyond the scope of this paper.
翻译:结合取样是汤普森取样的一个实际近似点,在维持模型参数的精确后方分布时,在计算上很难做到。在本文中,我们确立了一种遗憾界限,确保在将混合取样应用于线性土匪问题时确保可取的行为。这是对混合取样的第一次严格的遗憾分析,并且通过利用信息理论概念和新颖的分析技术而成为可能,而这些技术可能证明在本文件范围之外有用。</s>