In this paper, we consider the setting of piecewise i.i.d. bandits under a safety constraint. In this piecewise i.i.d. setting, there exists a finite number of changepoints where the mean of some or all arms change simultaneously. We introduce the safety constraint studied in \citet{wu2016conservative} to this setting such that at any round the cumulative reward is above a constant factor of the default action reward. We propose two actively adaptive algorithms for this setting that satisfy the safety constraint, detect changepoints, and restart without the knowledge of the number of changepoints or their locations. We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i.i.d. bandit literature. We also provide the first matching lower bounds for this setting. Empirically, we show that our safety-aware algorithms perform similarly to the state-of-the-art actively adaptive algorithms that do not satisfy the safety constraint.
翻译:在本文中, 我们考虑在安全约束下设置 papiswise i. id. 土匪。 在这个 papisid i. d. 设置中, 存在一定数量的更改点, 其中某些或所有武器的平均值同时变化。 我们在此设置中引入了在\citet{wu2016保守} 中研究的安全限制, 这样在任何回合中, 累积的奖励都高于默认行动奖励的常数因子 。 我们建议两种积极的适应性算法, 用于此设置, 满足安全限制, 检测更改点, 并在不知晓更改点数或其位置的情况下重新启动 。 我们为我们提供算法提供了遗憾的界限, 并显示这些界限与安全土匪和平面 i. d. 土匪文献中的对应方相近。 我们还为这一设置提供了第一个匹配的更低的界限 。 我们很生动地显示, 我们的安全觉算法与不满足安全约束条件的状态的积极适应性算法类似 。