In this note, we introduce a randomized version of the well-known elliptical potential lemma that is widely used in the analysis of algorithms in sequential learning and decision-making problems such as stochastic linear bandits. Our randomized elliptical potential lemma relaxes the Gaussian assumption on the observation noise and on the prior distribution of the problem parameters. We then use this generalization to prove an improved Bayesian regret bound for Thompson sampling for the linear stochastic bandits with changing action sets where prior and noise distributions are general. This bound is minimax optimal up to constants.
翻译:在本说明中,我们引入了一种已知的椭圆潜力的随机版,广泛用于分析相继学习和决策问题的算法,例如随机的椭圆潜力,我们随机的椭圆潜力,放松高斯对观测噪音和问题参数先前分布的假设。然后我们用这种概括来证明一种更好的巴伊西亚人对于Thompson对线性随机强盗取样的遗憾,这些强盗的动作组在以前和噪音分布一般的情况下变化。这个约束是最小到常数的最理想的。