We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, we find our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB~setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.
翻译:使用概率触发臂的情境组合赌博算法的研究(C$^2$MAB-T),其考虑多种平滑条件,其涵盖广泛的应用,例如情境级联赌博和情境影响最大化赌博。在触发概率调制(TPM)条件下,我们设计了C$^2$-UCB-T算法,并提出了一种新颖的分析方法,实现了一个 $\tilde{O}(d\sqrt{KT})$后悔上限,消除了一个潜在指数级的大因子$O(1/p_{min})$,其中$d$为情景的维数,$p_{min}$是任何臂可以被触发的最小概率,批处理大小$K$是每轮最多可以触发的臂的数量。在方差调节(VM)或触发概率和方差调节(TPVM)条件下,我们提出了一种新的方差自适应算法VAC$^2$-UCB,并得出了一个 $\tilde{O}(d\sqrt{T})$的后悔上限,该上限与批处理大小$K$无关。作为一个有价值的副产品,我们发现我们的分析技术和方差自适应算法也可以应用于CMAB-T和C$^2$MAB算法中,改善现有结果。我们还将我们的算法与基准算法在合成和实际数据集上的性能进行了比较实验,证明了我们算法比基准算法性能更优秀。