We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high-dimensional and sparse contextual bandits. For faster computation, we use variational inference instead of Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution. Extensive simulations demonstrate the improved performance of our proposed algorithm over existing ones.
翻译:我们考虑的是具有高维特征的随机线性线性背景土匪问题。我们用特殊种类的聚变诱导前科(如钉钉和板)来分析汤普森取样算法,以模拟未知参数,并根据预期累积遗憾提供近乎最佳的上限。据我们所知,这是首次为高维和分散背景土匪的汤普森取样提供理论保障的工作。为了更快的计算,我们使用变异推理法,而不是Markov链蒙特卡洛(MCMC)来估计后方分布。广泛的模拟表明我们提议的算法相对于现有算法的性能有所改善。