Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$. %When applied to the problem of Social Influence Maximization, the performance of the proposed algorithm surpasses the UCB algorithm and some more sophisticated domain-specific methods.
翻译:社会影响最大化等许多现实世界问题,如社会影响最大化面临在某一时刻从美元选项中选择最佳的K$的两难困境。 这种设置可以仿制成一个组合式的土匪,每次从美元军火中选择K$,目的是实现勘探和开采之间的有效交易。 这是组合式土匪的首个工作, 收到的反馈可以是所选的K美元武器的非线性功能。 多武装的复杂土匪的直接使用需要选择使国家空间大化的$- 美元选项。 在本文中, 我们展示了一种新型的算法, 计算效率高, 存储线以美元为线性。 提议的算法是一种基于分化的策略, 我们称之为CMAB- SM。 此外, 拟议的算法可以达到$\ textitilde O (K ⁇ frac{1 ⁇ 2 ⁇ N ⁇ frac {T\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\