High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the high-dimensional case that covers many standard structures, including sparsity. In this work, we study high-dimensional symmetric linear bandits where the symmetry is hidden from the learner, and the correct symmetry needs to be learned in an online setting. We examine the structure of a collection of hidden symmetry and provide a method based on model selection within the collection of low-dimensional subspaces. Our algorithm achieves a regret bound of $ O(d_0^{2/3} T^{2/3} \log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 \ll d$. With an extra assumption on well-separated models, we can further improve the regret to $ O(d_0\sqrt{T\log(d)} )$.
翻译:高维线性老虎机因其低维结构在实际应用中的重要性,近年来受到广泛关注。文献中最常见的结构是稀疏性,然而在实际场景中稀疏性可能并不适用。对称性——即奖励在臂集合的某些变换群作用下保持不变——是高维情形下另一种重要的归纳偏置,它涵盖了许多标准结构,包括稀疏性。本文研究高维对称线性老虎机问题,其中对称性对学习者是隐藏的,需要在在线环境中学习正确的对称性。我们分析了隐藏对称性集合的结构,并提出了一种基于低维子空间集合内模型选择的方法。我们的算法实现了 $ O(d_0^{2/3} T^{2/3} \log(d))$ 的遗憾界,其中 $d$ 是可能非常大的环境维度,$d_0$ 是真实低维子空间的维度且满足 $d_0 \ll d$。在模型充分分离的额外假设下,我们可以进一步将遗憾改进为 $ O(d_0\sqrt{T\log(d)} )$。