A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the arms (Kleinberg and Immorlica [FOCS18], Basu et al. [NIPS19]). As opposed to the standard MAB setting, where the optimal solution in hindsight can be trivially characterized, these correlations lead to (sub-)optimal solutions that exhibit interesting dynamical patterns -- a phenomenon that yields new challenges both from an algorithmic as well as a learning perspective. In this work, we extend the above direction to a combinatorial bandit setting and study a variant of stochastic MAB, where arms are subject to matroid constraints and each arm becomes unavailable (blocked) for a fixed number of rounds after each play. A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, yields a $(1-\frac{1}{e})$-approximation for partition matroids, yet it only guarantees a $\frac{1}{2}$-approximation for general matroids. In this paper we develop new algorithmic ideas that allow us to obtain a polynomial-time $(1 - \frac{1}{e})$-approximation algorithm (asymptotically and in expectation) for any matroid, and thus allow us to control the $(1-\frac{1}{e})$-approximate regret. A key ingredient is the technique of correlated (interleaved) scheduling. Along the way, we discover an interesting connection to a variant of Submodular Welfare Maximization, for which we provide (asymptotically) matching upper and lower approximability bounds.
翻译:最近的一行研究重点是研究多臂土匪问题(MAB ) 。 在这样的案例中, 玩家的行动和武器奖励分配( Kleinberg 和 Imorlica [FOCS18] 、 Basu 等 等 ) 之间, 强加了特定结构的时间相关性。 与标准的 MAB 设置相反, 后视的最佳解决方案可能微不足道地定性, 这些关联导致( 亚优) 展示有趣动态模式的解决方案 -- -- 这种现象既从算法角度也从学习角度产生新的挑战。 在此工作中, 我们将以上方向扩展为组合式土匪和Immorlica 分配( Kleinberg 和 Immorlimorlica 18 、 Basu et al. [NIPS19] 不同的是, 每次游戏后固定的轮数都无法使用。 这些关联导致( 阻塞土匪的) 状态的自然常态化化, 而对于配制型土匪来说, 仅能产生 $( 1\ c) 美元 和 美元 平基质 的内值 直径解 。