We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as $\Omega(\sqrt{n})$ whenever the expected cumulative regret of the algorithm is $O(\sqrt{n})$, where $n$ is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by \cite{lattimore2017end}, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these "locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios -- \emph{model selection} and \emph{clustering} in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary -- the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret.
翻译:在任何线性土匪算法产生的设计矩阵中,当动作组的曲线曲度良好时,我们以亚线性遗憾而生成的线性土匪算法生成的设计矩阵的不亚偏差值。具体地说,我们显示,当算法的预期累积遗憾为$O(sqrt{n})美元时,预期设计矩阵的最小egen值会增加为$O(sqrt{n}美元。如果算法的累积遗憾为$O(\sqrt{n})美元,而动作-空间在最优臂周围有一个恒定的黑森值。这表明,这样的动作-空间会迫使多线性直线性值的指数降低,而不是对数值更低的对数。正如\cite{lattmmore2017end} 所显示的那样,在离散的(e.e, 井状) 动作空间中的最小值会增加。此外,前一个结果只显示在微调的模型中(as brode) (asy) (as-to) roudal dal dust) li lial dust dold) lady fold 框架中,我们得到一个必要的动作空间的数值。