In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.
翻译:在本文中,我们研究了组合式半义(CMAB),并侧重于减少分批规模KK美元在遗憾约束下的依赖性,在这种情况下,K$是每轮可以拉动或触发的武器总数。首先,为设定具有概率触发武器的CMAB(CMAB-T),我们发现了一个新的(方向性)触发概率和差异调和(TPVM)条件,可以取代以前为各种应用(例如查封匪徒、在线网络探索和在线影响最大化)使用的平滑条件。在此新条件下,我们建议采用BCCB-T算法,采用差异意识信任期,进行遗憾分析,将O(K)美元系数降低为美元(CM),或者将美元(CM2K)调和(TPVM)调和(TPVM)调和(TVM)调和(SECB)调和(SECB算法)的不触发值应用。我们可以通过提高现有实验性能分析的顶级分析结果,彻底地展示了我们目前实验性实验性实验性分析结果的顶级分析结果。