In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, $\mathcal{O}\big((\log d)^{\frac{\alpha+1}{2}}T^{\frac{1-\alpha}{2}}+\log T\big)$, for the cumulative regret, in terms of horizon $T$, dimension $d$ and a margin parameter $\alpha\in[0,1]$, which controls the separation between the optimal and the sub-optimal arms. This new lower bound unifies existing regret bound results that have different dependencies on T due to the use of different values of margin parameter $\alpha$ explicitly implied by their assumptions. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy that achieves a regret upper bound matching the lower bound. The proposed algorithm uses a properly centered $\ell_1$-ball as the confidence set in contrast to the commonly used ellipsoid confidence set. In addition, the algorithm does not require any forced sampling step and is thereby adaptive to the practically unknown margin parameter. Simulations and a real data analysis are conducted to compare the proposed method with existing ones in the literature.
翻译:在本文中, 我们用高维特性来考虑多武装的土匪问题。 首先, 我们证明, 以地平线 $T 、 维度 $D$ 和 边距参数 $ 0. 1, 控制最佳和亚最佳武器之间分离的多武装土匪问题。 这个新的低约束统一了现有的遗憾约束结果, 由于使用不同边差参数值 $\\ d( log d)\\\ frapha+1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\$$$, 美元, 美元 $ $ mindeximplegress, 这种结果对 T有不同的依赖性结果。 此外,, 算法并不要求使用任何不为实际的比值, 比较现有的参数, 比较现有的参数。