We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over \Delta_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over \Delta_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.
翻译:我们考虑对一组军火进行组合式半土匪 $[cal X}\ subset = 0. 1 ⁇ d$, 奖励在项目之间并不相关。 对此问题, 运算 ESCB 生成了已知最小的遗憾约束 $( T) = $( cal O) ⁇ Big ( {d( m) = 2 ( ln)\ ( t)\ over\ Delta ⁇ min} ⁇ Big), 但它有计算复杂性 $( cal O} ( {cal X ⁇ ) $, 通常以美元指数指数指数指数( $) = 0. 1, 并且无法在大范围内使用。 我们建议的第一种算法, 既在计算和统计上都有效, 美元( T) = =\ cal_ O}, 表示这一算法是如何通过一个最大程度的 解算法, 显示这个算法如何在 AL $ 的 里 选项上 $ 。