In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $\alpha$-th ($1<\alpha\le 2$) moments bounded by $\sigma^\alpha$, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $\alpha,\sigma$ are unknown, \texttt{HTINF} achieves a $\log T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(\sigma K^{1-\nicefrac 1\alpha}T^{\nicefrac{1}{\alpha}})$ minimax optimal regret even in adversarial settings, without prior knowledge on $\alpha$ and $\sigma$. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and $\alpha$ and $\sigma$ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $\alpha$ and $\sigma$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
翻译:在本文中,我们将重尾多臂强盗的概念推广到敌对环境中,并为重尾多臂强盗(MAB)开发了强大的双向最佳算法,其损失为$-alpha$th (1 ⁇ alpha\le 2美元), 且差异可能不存在。 具体地说, 我们设计了一个算法, 当代理人知道重尾参数$/ alpha$ 和$/ gmas$ 时, 并且为重尾的双向世界发展了双向的算法。 当$\alpha_alpha_alpha_alphaniax(1}talphia_lexal_al-almaxal_lickral_al-lickral_al_lickrickral_al_tal_lickral_rickral_ral_ral_lickral_ral_ral_ral_ral_ral_lick_ral_ral_l_l_ral_ral_ral_ral_ral_l_ral_ral_l_rick_l_l_ral_l_l_l_ral_l_l_l_ral_l_l_l_l_l_l_ral_ral_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_lx_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l_l