In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., significant feature elements, using the Lasso framework with thresholding, and (ii) selects an arm greedily according to this estimate projected on its support. The algorithm does not require prior knowledge of the sparsity index $s_0$ and can be parameter-free. For this simple algorithm, we establish non-asymptotic regret upper bounds scaling as $\mathcal{O}( \log d + \sqrt{T} )$ in general, and as $\mathcal{O}( \log d + \log T)$ under the so-called margin condition (a probabilistic condition on the separation of the arm rewards). The regret of previous algorithms scales as $\mathcal{O}( \log d + \sqrt{T \log (d T)})$ and $\mathcal{O}( \log T \log d)$ in the two settings, respectively. Through numerical experiments, we confirm that our algorithm outperforms existing methods.
翻译:在本文中,我们重新审视了稀有的随机线性线性匪徒的最小化遗憾问题, 在那里, 特性矢量可能具有很大的维度 $d $, 但奖励功能仅依赖于这些特性的少数, 比如 $_ 0\ll d$, 我们只提出这些特性。 我们展示了一个算法, (一) 估计矢量定义奖赏功能及其稀少的支持, 即重要特性元素, 使用有阈值的拉索框架, 并且 (二) 根据对它的支持预测的估算, 贪婪地选择一个臂。 算法不需要事先知道 sparsity 指数 $_ 0 美元, 也可以是没有参数的。 对于这个简单的算法, 我们设置了非正负数的上界后悔缩放为$\ cal{O} (\ log + sqrt} $, 以及 $macal {\ ligial ral as legal) 。 (alog dlog\ dlog lear) a orizal squest 。 (a) a lax sal labs dargoal dur) dro) a lax sal as lax sal lax s 。