We study a model selection problem in the linear bandit setting, where the learner must adapt to the dimension of the optimal hypothesis class on the fly and balance exploration and exploitation. More specifically, we assume a sequence of nested linear hypothesis classes with dimensions $d_1 < d_2 < \dots$, and the goal is to automatically adapt to the smallest hypothesis class that contains the true linear model. Although previous papers provide various guarantees for this model selection problem, the analysis therein either works in favorable cases when one can cheaply conduct statistical testing to locate the right hypothesis class or is based on the idea of "corralling" multiple base algorithms which often performs relatively poorly in practice. These works also mainly focus on upper bounding the regret. In this paper, we first establish a lower bound showing that, even with a fixed action set, adaptation to the unknown intrinsic dimension $d_\star$ comes at a cost: there is no algorithm that can achieve the regret bound $\widetilde{O}(\sqrt{d_\star T})$ simultaneously for all values of $d_\star$. We also bring new ideas, i.e., constructing virtual mixture-arms to effectively summarize useful information, into the model selection problem in linear bandits. Under a mild assumption on the action set, we design a Pareto optimal algorithm with guarantees matching the rate in the lower bound. Experimental results confirm our theoretical results and show advantages of our algorithm compared to prior work.
翻译:我们研究了线性土匪设置中的模型选择问题, 学习者必须适应飞行和平衡勘探与开发的最佳假设等级的维度。 更具体地说, 我们假设一系列嵌入的线性假设假设等级, 其规模为$_ 1 < d_ 2 < d\ dots$, 目标是自动适应包含真正线性模型的最小假设类别。 虽然以前的论文为这个模型选择问题提供了各种保障, 但其中的分析要么在有利的情况下起作用, 当人们可以廉价地进行统计测试以定位正确的假设等级时, 或者基于在实践中往往表现较差的“ coralling” 多重基础算法的想法。 这些计算方法还主要侧重于上层框框的遗憾。 在本文中, 我们首先设定了一个较低的界限, 即使有固定的动作组合, 适应未知的内在维度 $d ⁇ star$*{O} (\\qqqrt{\\\\\\\\\\\\\\\\\\ starT}T} 这样的分析要么 中, 也可以同时为美元的所有价值进行统计测试, 。 我们还将新的理论级算算算算的模型, 在虚拟模型的模型中, 定义中, 我们的排序中, 将新的模型中, 定义的排序中, 将显示一个虚拟的模型的模型到一个模型的模型的模型的模型的模型的排序。