We study combinatorial multi-armed bandit with probabilistically triggered arms (CMAB-T) and semi-bandit feedback. We resolve a serious issue in the prior CMAB-T studies where the regret bounds contain a possibly exponentially large factor of $1/p^*$, where $p^*$ is the minimum positive probability that an arm is triggered by any action. We address this issue by introducing a triggering probability modulated (TPM) bounded smoothness condition into the general CMAB-T framework, and show that many applications such as influence maximization bandit and combinatorial cascading bandit satisfy this TPM condition. As a result, we completely remove the factor of $1/p^*$ from the regret bounds, achieving significantly better regret bounds for influence maximization and cascading bandits than before. Finally, we provide lower bound results showing that the factor $1/p^*$ is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.
翻译:我们用概率触发武器(CMAB-T)和半土匪回馈方法研究集束多武装土匪,在此前的CMAB-T研究中解决了一个严重问题,在这些研究中,遗憾界限中可能含有1/p ⁇ 美元这一指数性大系数,而美元是任何行动触发一个手臂的最低正概率。我们通过在一般的CMAB-T框架中引入触发概率调制(TPM)约束性平滑条件来解决这一问题,并表明影响最大化土匪和组合式土匪等许多应用方法都满足了TPM条件。结果,我们完全消除了1/p ⁇ 美元在遗憾界限中的系数,从而大大改善了影响最大化和诱导匪的遗憾界限。最后,我们提供了较低的约束结果,表明系数1/p ⁇ 是一般的CMAB-T问题不可避免的,表明TP条件对于消除这一因素至关重要。