Model selection in contextual bandits is an important complementary problem to regret minimization with respect to a fixed model class. We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. Even in this instance, current state-of-the-art methods explore in a suboptimal manner and require strong "feature-diversity" conditions. In this paper, we introduce new algorithms that a) explore in a data-adaptive manner, and b) provide model selection guarantees of the form $\mathcal{O}(d^{\alpha} T^{1- \alpha})$ with no feature diversity conditions whatsoever, where $d$ denotes the dimension of the linear model and $T$ denotes the total number of rounds. The first algorithm enjoys a "best-of-both-worlds" property, recovering two prior results that hold under distinct distributional assumptions, simultaneously. The second removes distributional assumptions altogether, expanding the scope for tractable model selection. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
翻译:在背景土匪中选择模型是一个重要的补充问题,对于在固定模型类中最小化后悔是一个重要的补充问题。 我们考虑了最简单的非三重模式选择实例:将简单的多武装土匪问题与线性背景土匪问题区分开来。 即使在此情况下,目前最先进的方法也以亚于最佳的方式探索,并需要强大的“地貌多样性”条件。 在本文中,我们引入了新的算法,a) 以数据适应方式探索,b) 提供模式选择保证,即: $\mathcal{O}(d ⁇ alpha} T ⁇ 1- \ alpha}) 形式的模式选择保证,不带有任何特征的多样性条件。 我们的方法扩大到在一些额外假设下的巢状直系背景土匪中进行模型选择。