We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). In the adversarial regime with a self-bounding constraint and the stochastic regime with adversarial corruptions as its special case we improve the dependence on corruption magnitude $C$. In particular, for $C = \Theta\left(\frac{T}{\log T}\right)$, where $T$ is the time horizon, we achieve an improvement by a multiplicative factor of $\sqrt{\frac{\log T}{\log\log T}}$ relative to the bound of Zimmert and Seldin (2021). We also improve the dependence of the regret bound on time horizon from $\log T$ to $\log \frac{(K-1)T}{(\sum_{i\neq i^*}\frac{1}{\Delta_i})^2}$, where $K$ is the number of arms, $\Delta_i$ are suboptimality gaps for suboptimal arms $i$, and $i^*$ is the optimal arm. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.
翻译:我们提高了Zimmert和Seldin的Tsallis-INF算法(2021年)的遗憾度。在有自我约束限制的对抗制度和以对抗性腐败为特例的质疑制度中,我们提高了对腐败的依赖程度。特别是,对于美元=Teta\left (\frac{Tunlog T ⁇ right) 美元,即T美元为时平线的美元,我们通过倍增效应因数($\sqrt=sfracxxlog Tunlog\log t ⁇ $)实现改善。在时间跨度上,我们也改善了对遗憾的依赖度,从$t$到$\log\frac{(K-1T\\\\\\\\\\\\\\ssumiq\neq i{1⁄Delta_i}$, 美元是武器数量,美元=Delta_i_i$(美元) 相对于Zimmeroptial $(2021年) 和Seldinal-commasilatial rosoal rodual romasoal) 提供最佳分析。