We consider a stochastic contextual bandit problem where the dimension $d$ of the feature vectors is potentially large, however, only a sparse subset of features of cardinality $s_0 \ll d$ affect the reward function. Essentially all existing algorithms for sparse bandits require a priori knowledge of the value of the sparsity index $s_0$. This knowledge is almost never available in practice, and misspecification of this parameter can lead to severe deterioration in the performance of existing methods. The main contribution of this paper is to propose an algorithm that does not require prior knowledge of the sparsity index $s_0$ and establish tight regret bounds on its performance under mild conditions. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms existing methods, even when the correct sparsity index is revealed to them but is kept hidden from our algorithm.
翻译:我们认为,如果地物矢量的维度为美元,可能非常大,则其背景土匪问题就是一个隐形问题,然而,只有几小部分的基点特征才会影响奖励功能。对于稀土强盗来说,所有现有的算法基本上都需要事先了解聚度指数值($s_0美元),而这种知识在实践中几乎从未存在,而这一参数的错误区分可能导致现有方法的性能严重恶化。本文的主要贡献是提出一种算法,这种算法不需要事先了解聚度指数($s_0美元),并且对其在温和条件下的性能建立严格的遗憾界限。我们还全面地从数字上评估了我们提议的算法,并表明它始终超越了现有方法,即使正确的聚度指数暴露给了它们,但却隐藏在我们的算法中。