Motivated by the pressing need for efficient optimization in online recommender systems, we revisit the cascading bandit model proposed by Kveton et al. (2015). While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter. In this paper, we first provide a problem-dependent upper bound on the regret of a TS algorithm with Beta-Bernoulli updates; this upper bound is tighter than a recent derivation under a more general setting by Huyuk and Tekin (2019). Next, we design and analyze another TS algorithm with Gaussian updates, TS-Cascade. TS-Cascade achieves the state-of-the-art regret bound for cascading bandits. Complementarily, we consider a linear generalization of the cascading bandit model, which allows efficient learning in large cascading bandit problem instances. We introduce and analyze a TS algorithm, which enjoys a regret bound that depends on the dimension of the linear model but not the number of items. Finally, by using information-theoretic techniques and judiciously constructing cascading bandit instances, we derive a nearly matching regret lower bound for the standard model. Our paper establishes the first theoretical guarantees on TS algorithms for stochastic combinatorial bandit problem model with partial feedback. Numerical experiments demonstrate the superiority of the proposed TS algorithms compared to existing UCB-based ones.
翻译:由于迫切需要在网上推荐人系统中实现高效优化,我们重新审视了Kveton等人(2015年)提出的连锁盗匪模式。尽管Thompson抽样(TS)算法被证明在经验上优于高级信任盗匪(UCB)算法,但只有后者才知道理论上的保证。在本文中,我们首先根据使用Beta-Bernoulli更新的TS算法的遗憾程度,提供一种取决于问题的上层界限;这一上层界限比Huyuk和Tekin(2019年)较一般设置的最近衍生更加紧。接下来,我们设计和分析另一个TSTS算法与Gaussian更新、TS-Cascade(TS-Cascade)的高级信任(UCB)算法在经验上优于高信任盗匪(UCBB)算法的高级算法,而TS-CS-CASade的算法在理论上优劣级运算法,最后我们考虑对Ciscabal 模型进行线性分析,我们先行的排序的排序的排序排序排序排序排序的排序排序的计算。我们最后要展示了标准项目。我们先算法,然后的排序的排序的排序的模型,最后,我们将排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序,我们用的是,我们最后的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序。