In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize similar bounds for the Beta distribution obtained in the seminal paper Alfers and Dinges [1984]. Additionally, our results can be considered a sharp non-asymptotic version of the inverse of Sanov's theorem studied by Ganesh and O'Connell [1999] in the Bayesian setting. Based on these results, we derive new deviation bounds for the Dirichlet process posterior means with application to Bayesian bootstrap. Finally, we apply our estimates to the analysis of the Multinomial Thompson Sampling (TS) algorithm in multi-armed bandits and significantly sharpen the existing regret bounds by making them independent of the size of the arms distribution support.
翻译:翻译摘要:
在本文中,我们推导出对Dirichlet随机变量加权求和的尖锐非渐进偏差界限。这些界限基于加权Dirichlet求和密度的新型积分表示。该表示使我们能够使用几何和复分析方法获得类似于高斯的求和分布近似。我们的结果推广了Alfers和Dinges[1984]的Beta分布界限。此外,我们的结果可以认为是Ganesh和O'Connell[1999]在贝叶斯设置下逆Sanov定理的尖锐非渐进版本。基于这些结果,我们给出了Dirichlet过程后验均值的新的偏差界限并在贝叶斯自助法方面应用了这一结果。最后,我们将我们的估计应用于多臂赌博机的多项式汤普森采样(TS)算法的分析,并通过使其与武器分布支持的尺寸无关,显着改进了现有的遗憾界限。