We study the adversarial bandit problem under $S$ number of switching best arms for unknown $S$. For handling this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with basic OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$. For improving the regret bound with respect to $T$, we propose to use adaptive learning rates for OMD to control variance of loss estimators, and achieve $\tilde{O}(\min\{\mathbb{E}[\sqrt{SKT\rho_T(h^\dagger)}],S\sqrt{KT}\})$, where $\rho_T(h^\dagger)$ is a variance term for loss estimators.
翻译:为了解决这个问题,我们采用在线镜底法(OMD)来采用主基框架。我们首先用基本OMD提供主基算法,达到$\tilde{O}(S\1/2}K}K}1/3}T ⁇ 2/3})美元。为了改善对$的遗憾,我们提议对OMD采用适应性学习率来控制损失估计者的差异,并实现$\tilde{O}(min ⁇ mathbb{E}[Sqrt{SKT_T}],S\qrt{K}}(Sqrt{K}})美元,其中$\rho_T(h ⁇ dagger)是损失估计者的一个差异术语。