In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., significant feature elements, using the Lasso framework with thresholding, and (ii) selects an arm greedily according to this estimate projected on its support. The algorithm does not require prior knowledge of the sparsity index $s_0$. For this simple algorithm, we establish non-asymptotic regret upper bounds scaling as $\mathcal{O}( \log d + \sqrt{T} )$ in general, and as $\mathcal{O}( \log d + \log T)$ under the so-called margin condition (a setting where arms are well separated). The regret of previous algorithms scales as $\mathcal{O}( \log d + \sqrt{T \log (d T)})$ and $\mathcal{O}( \log T \log d)$ in the two settings, respectively. Through numerical experiments, we confirm that our algorithm outperforms existing methods.
翻译:在本文中,我们重温了稀薄的随机背景线性匪徒的最小化遗憾问题, 在那里, 特质矢量可能具有很大的维度 $d$, 但奖励功能仅依赖于这些特性的少数, 比如 $_ 0\ll d$, 我们只提出这些特性的少数。 我们展示了 slusholded Lasso 土匪, 一种算法, (一) 使用有阈值的拉索框架来估计矢量确定奖赏功能的矢量及其稀少的支持, 即重要的特性元素, 使用有阈值的拉索框架, 并且 (二) 根据对它的支持预测的估算, 贪婪地选择一个手臂 。 算法不需要事先知道 sparsity 指数 ${% 0$@ 0$。 对于这个简单的算法, 我们设置了非默认的上限值, 底值为$\\ math{O} ( log\ talog\ cal) 的缩算法 。