In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., significant feature elements, using the Lasso framework with thresholding, and (ii) selects an arm greedily according to this estimate projected on its support. The algorithm does not require prior knowledge of the sparsity index $s_0$ and can be parameter-free under some symmetric assumptions. For this simple algorithm, we establish non-asymptotic regret upper bounds scaling as $\mathcal{O}( \log d + \sqrt{T} )$ in general, and as $\mathcal{O}( \log d + \log T)$ under the so-called margin condition (a probabilistic condition on the separation of the arm rewards). The regret of previous algorithms scales as $\mathcal{O}( \log d + \sqrt{T \log (d T)})$ and $\mathcal{O}( \log T \log d)$ in the two settings, respectively. Through numerical experiments, we confirm that our algorithm outperforms existing methods.
翻译:在本文中, 我们重新审视了稀有的随机线性线性土匪的最小化遗憾问题, 在那里, 特性矢量可能具有很大的维度 $d$, 但奖励功能仅依赖于这些特性的少数, 比如 $_ 0\ll d$ 。 我们只展示了这些特性。 我们使用一个算法, (一) 估计矢量定义奖赏功能及其稀少的支持, 即重要特性元素, 使用有阈值的拉索框架, 并且 (二) 根据对它的支持预测的估算, 贪婪地选择一个臂。 算法不需要事先知道 月度指数 $_ 0$, 在某些对称假设下可以是无参数的 。 对于这个简单算法, 我们设置了不设防缩的遗憾上限, 以$\ macal{ O} (\log d+ sqrt} 普通值, 以及 $\ mathcalalal( a problog\ ralal) 和 roal dral 的缩度 。 (alog deal) 。 ( a rolog deal deal) dqral deal) 。