The model selection problem in the pure exploration linear bandit setting is introduced and studied in both the fixed confidence and fixed budget settings. The model selection problem considers a nested sequence of hypothesis classes of increasing complexities. Our goal is to automatically adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model, rather than suffering from the complexity measure related to the largest hypothesis class. We provide evidence showing that a standard doubling trick over dimension fails to achieve the optimal instance-dependent sample complexity. Our algorithms define a new optimization problem based on experimental design that leverages the geometry of the action set to efficiently identify a near-optimal hypothesis class. Our fixed budget algorithm uses a novel application of a selection-validation trick in bandits. This provides a new method for the understudied fixed budget setting in linear bandits (even without the added challenge of model selection). We further generalize the model selection problem to the misspecified regime, adapting our algorithms in both fixed confidence and fixed budget settings.
翻译:纯粹勘探线性土匪设置的模型选择问题在固定信心和固定预算设置中都引入并研究。 模型选择问题考虑了日益复杂的假设类别中的嵌套序列。 我们的目标是自动适应包含真实模型的最小假设类的根据实例的复杂度, 而不是受与最大假设类有关的复杂度的制约。 我们提供证据表明, 标准双倍的参数无法达到最佳的根据实例进行抽样的复杂度。 我们的算法根据实验设计定义了一个新的优化问题,它利用了所设定动作的几何性来有效识别接近最佳的假设类。 我们的固定预算算法在土匪中应用了一种新的选择-验证技巧。 这为未得到充分研究的线性土匪固定预算设置提供了一种新的方法( 即使没有增加模型选择的挑战 ) 。 我们进一步将模型选择问题概括到错误的系统, 在固定信心和固定预算环境下调整我们的算法 。