We consider the problem of optimizing a black-box function based on noisy bandit feedback. Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. They heavily rely on the assumption that the model is well-specified, however, and can fail without it. Instead, we introduce a \emph{misspecified} kernelized bandit setting where the unknown function can be $\epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We design efficient and practical algorithms whose performance degrades minimally in the presence of model misspecification. Specifically, we present two algorithms based on Gaussian process (GP) methods: an optimistic EC-GP-UCB algorithm that requires knowing the misspecification error, and Phased GP Uncertainty Sampling, an elimination-type algorithm that can adapt to unknown model misspecification. We provide upper bounds on their cumulative regret in terms of $\epsilon$, the time horizon, and the underlying kernel, and we show that our algorithm achieves optimal dependence on $\epsilon$ with no prior knowledge of misspecification. In addition, in a stochastic contextual setting, we show that EC-GP-UCB can be effectively combined with the regret bound balancing strategy and attain similar regret bounds despite not knowing $\epsilon$.
翻译:我们考虑的是根据吵闹的土匪反馈优化黑箱功能的问题。 内分泌的土匪算法已经展示出这一问题在经验和理论方面的强力表现。 它们在很大程度上依赖于模型非常精细的假设, 但是, 并且可以没有它就失败。 相反, 我们引入了一个内分泌的土匪设置, 未知的函数可以被一个在复制的Kernel Hilbert空间( RKHS)中具有约束性规范的函数所近似于美元。 我们设计了高效而实用的算法, 其性能在模型性格错误的情况下表现得最低。 具体地说, 我们提出两种基于高斯进程( GP) 方法的算法: 乐观的EC- GP- UCB 算法, 需要了解错误的分类, 分阶段的GP不确定性标定的算法, 一种消除型算法, 能够适应未知的模型性标定的函数。 我们提供了在美元、 时间范围 和基础内核内分流中, 我们展示的是, 我们的演算法能够有效地达到一个精度的精度, 最精确的直观, 。