In this paper, we study a non-stationary stochastic bandit problem, which generalizes the switching bandit problem. On top of the switching bandit problem (\textbf{Case a}), we are interested in three concrete examples: (\textbf{b}) the means of the arms are local polynomials, (\textbf{c}) the means of the arms are locally smooth, and (\textbf{d}) the gaps of the arms have a bounded number of inflexion points and where the highest arm mean cannot vary too much in a short range. These three settings are very different, but have in common the following: (i) the number of similarly-sized level sets of the logarithm of the gaps can be controlled, and (ii) the highest mean has a limited number of abrupt changes, and otherwise has limited variations. We propose a single algorithm in this general setting, that in particular solves in an efficient and unified way the four problems (a)-(d) mentioned.
翻译:在本文中,我们研究了一个非静止的土匪问题,它概括了交换土匪问题。除了交换土匪问题(\ textbf{Case a})之外,我们感兴趣的有三个具体例子:(\ textbf{b}) 武器的手段是局部的多面性,(\ textbf{c}) 武器的手段是局部平稳的,以及(\ textbf{d}) 手臂的缺口有一定数量的伸缩点,而且最高的手臂平均值在短距离内变化不会太大。这三个设置非常不同,但有以下共同点:(一) 差距对数的大小相似的对数可以控制,以及(二) 最高平均值的突变数有限,其他的变数也有限。我们在这个总体环境中提出了一个单一的算法,特别是以有效和统一的方式解决所提到的四个问题(a-(d))。