Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit gradual drifting patterns, the weighted strategy is commonly adopted in real-world applications. However, previous theoretical studies show that its analysis is more involved and the algorithms are either computationally less efficient or statistically suboptimal. This paper revisits the weighted strategy for non-stationary parametric bandits. In linear bandits (LB), we discover that this undesirable feature is due to an inadequate regret analysis, which results in an overly complex algorithm design. We propose a refined analysis framework, which simplifies the derivation and importantly produces a simpler weight-based algorithm that is as efficient as window/restart-based algorithms while retaining the same regret as previous studies. Furthermore, our new framework can be used to improve regret bounds of other parametric bandits, including Generalized Linear Bandits (GLB) and Self-Concordant Bandits (SCB). For example, we develop a simple weighted GLB algorithm with an $\widetilde{O}(k_\mu^{\frac{5}{4}} c_\mu^{-\frac{3}{4}} d^{\frac{3}{4}} P_T^{\frac{1}{4}}T^{\frac{3}{4}})$ regret, improving the $\widetilde{O}(k_\mu^{2} c_\mu^{-1}d^{\frac{9}{10}} P_T^{\frac{1}{5}}T^{\frac{4}{5}})$ bound in prior work, where $k_\mu$ and $c_\mu$ characterize the reward model's nonlinearity, $P_T$ measures the non-stationarity, $d$ and $T$ denote the dimension and time horizon.
翻译:最近,非静止参数土匪受到了很多注意。 在线性土匪(LB) 中, 我们发现, 处理非静止性有三种原则性的方法, 包括滑动窗口、 加权和重新启动策略。 许多非静止环境表现出逐渐的漂移模式, 加权战略通常在现实世界应用程序中采用。 但是, 以前的理论研究表明, 其分析比以往更能参与, 算法效率更低, 而算法也更差。 本文重新审视了非静止参数土匪的加权策略。 在线性土匪( LB) 中, 我们发现这一不良特征是由于遗憾分析不足, 导致一个过于复杂的算法设计。 我们提议了一个精细的分析框架, 它简化了出法, 并产生一个更简单的重重算算法, 与以往的研究一样有效。 此外, 我们的新框架可以用来改善其他偏差土匪的遗憾框框框, 包括通用的直线匪(GLB) $, 和自调的仪算法(TB) $ ($ ($ c__________________B_______ y_ y_____ ral_________________________________B_B_B____B_____________________________</s>