Generalized Linear Bandits (GLBs) are powerful extensions to the Linear Bandit (LB) setting, broadening the benefits of reward parametrization beyond linearity. In this paper we study GLBs in non-stationary environments, characterized by a general metric of non-stationarity known as the variation-budget or \emph{parameter-drift}, denoted $B_T$. While previous attempts have been made to extend LB algorithms to this setting, they overlook a salient feature of GLBs which flaws their results. In this work, we introduce a new algorithm that addresses this difficulty. We prove that under a geometric assumption on the action set, our approach enjoys a $\tilde{\mathcal{O}}(B_T^{1/3}T^{2/3})$ regret bound. In the general case, we show that it suffers at most a $\tilde{\mathcal{O}}(B_T^{1/5}T^{4/5})$ regret. At the core of our contribution is a generalization of the projection step introduced in Filippi et al. (2010), adapted to the non-stationary nature of the problem. Our analysis sheds light on central mechanisms inherited from the setting by explicitly splitting the treatment of the learning and tracking aspects of the problem.
翻译:通用线性强盗(GLBs)是线性强盗(LB)设置的有力延伸,扩大了奖励超线性强的优势。 在本文中,我们研究了非静止环境中的GLBs, 其特征是非静止环境的一般非常态度量,称为变换预算或=mph{parater-drift}, 意指$B_T美元。 虽然以前曾尝试将LB算法扩展至此环境, 但忽略了GLBs中一个突出的、影响其结果的特征。 在这项工作中,我们引入了一种解决这一困难的新算法。我们证明,在对整套行动进行几何假设的假设下,我们的方法享有一个非常态性的通用度度量值,称为变换预算或=mph{parmeter-drifter-drift}, 美元。 在一般情况下,我们显示它受损失的幅度最多为$tilde tphalcal{O}{(B____T ⁇ 1/5}T ⁇ 4/5} 。 在这项工作中,我们的贡献的核心在于我们所做出的一项预测性分析,即从菲利普和中央分析的不易变的阶段的阶段, 。