In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.
翻译:在本文中, 我们考虑高山进程( GP) 土匪优化在非静止环境中的问题。 为了捕捉外部变化, 黑盒功能允许在复制的Hilbert 空间( RKHS) 复制核心空间( RKHS) 中进行时间变化。 为此, 我们开发了WG- UCB, 这是一种基于加权高山进程回归的新型UCB型算法。 关键的挑战是如何应对无限的地貌特征地图。 为此, 我们利用内核接近技术来证明一个子线性遗憾捆绑, 这是第一个( 链式) 线性次线性负遗憾保证, 这是对加权的、 时间变化的土匪和一般非线性奖励的次线性遗憾保证 。 这导致非静止线性线性土匪和标准的GPP- UCB 算法普遍化。 此外, 在加权的高空进程回归中实现了新的集中性不平等性。 我们还为加权的最大信息收益提供了通用的上限和重度上限。 这些结果对于应用具有独立的兴趣, 例如新闻排名和调整定价等应用方式, 其权重度可以用来衡量现有算算算算算数据收益。