差异- 软件单向线性强盗 (Variance-Aware Sparse Linear Bandits)

It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity which is much smaller than the ambient dimension, the worst-case minimax regret is $\widetilde{\Theta}\left(\sqrt{dT}\right)$ where $d$ is the ambient dimension and $T$ is the number of rounds. On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$. In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: $\widetilde{\mathcal O}\left(\sqrt{d\sum_{t=1}^T \sigma_t^2} + 1\right)$, where $\sigma_t^2$ is the variance of the noise at the $t$-th round. This bound naturally interpolates the regret bounds for the worst-case constant-variance regime (i.e., $\sigma_t \equiv \Omega(1)$) and the benign deterministic regimes (i.e., $\sigma_t \equiv 0$). To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits in a "black-box" manner. Specifically, we take two recent algorithms as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.

翻译：众所周知, 对于稀少的线性土匪, 当忽略对比环境维度小得多的超度的依赖时, 最差的迷你遗憾是 $\ 全方位( 全方位) left (\ sqrt{d ⁇ right) $(美元) 是环境维度, $T$ 是圆轮数。另一方面, 在无噪音且动作集为单位域的良性环境中, 人们可以使用差异和差异来达到 $ 全方位( 超度) o}( 1美元) ( 1美元) 的逆差( 几乎) 美元) 。在本文中, 最差的线性土匪首个差异( $/ 全方位) odort{ d\ sum_ t=1 ⁇ T\\\\\\ gmam_ t} + 1\ right) $, 美元是双倍( 美元) 的逆差值( 美元) 直方位( 直值) 直方位( 直方位) 直方位( 直方位) 直方位) 直立地) 直方( 直方( 直方) 直方體) 直方( 直方) 直方) 直方) 。