We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability. At the beginning of each round, the agent observes the set of available base arms and their contexts and then selects an action that is a feasible subset of the set of available base arms to maximize its cumulative reward in the long run. We assume that the mean outcomes of base arms are samples from a Gaussian Process (GP) indexed by the context set ${\cal X}$, and the expected reward is Lipschitz continuous in expected base arm outcomes. For this setup, we propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB) and prove that it incurs $\tilde{O}(\sqrt{\lambda^*(K)KT\overline{\gamma}_{T}} )$ regret with high probability, where $\overline{\gamma}_{T}$ is the maximum information gain associated with the set of base arm contexts that appeared in the first $T$ rounds, $K$ is the maximum cardinality of any feasible action over all rounds and $\lambda^*(K)$ is the maximum eigenvalue of all covariance matrices of selected actions up to time $T$, which is a function of $K$. To dramatically speed up the algorithm, we also propose a variant of O'CLOK-UCB that uses sparse GPs. Finally, we experimentally show that both algorithms exploit inter-base arm outcome correlation and vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.
翻译:在每回合开始时,代理商观察一套可用的基础武器及其背景,然后选择一套可行的基础武器子集,以便长期最大限度地增加其累积报酬。我们假设,基础武器的平均结果来自一个Gausian进程(GP)的样本,该过程按其上下文设定了$x美元,预期的奖励是利普施茨在预期的基臂结果中的持续使用。对于这一设置,我们提议了一个名为“最佳组合学习”和“优化”的算法,在每回合开始时,该代理商观察一套可用的基础武器及其背景,然后选择一个作为一组可用基础武器的一个可行子子,以便长期最大限度地增加其累积报酬。我们假设,基础武器的平均结果来自一个按其上下文设定为$x美元指数的Gausian 进程(GGP), 其中,$Obroupal-alal-al-alligal-lational-legal-legal-legal-legal-legal-legal-late-legal-legal-legal-legal-legal-legental-le-legal-le-legal-le-le-le-lenal-lational$$$x-moal-s-legal-lational-legal-legal-legal-legal-leg-leg-legal-leg-s-s-s-leg-leg-leg-s-s-s-leg-legal-legal-lemental-lemental-lemental-lemental-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-sal-legal-legal-legal-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-le-le-