This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: \textit{strongly observable} graphs yield minimax regret of $\tilde{\Theta}( \alpha^{1/2} T^{1/2} )$, while \textit{weakly observable} graphs induce minimax regret of $\tilde{\Theta}( \delta^{1/3} T^{2/3} )$, where $\alpha$ and $\delta$, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. Our proposed algorithm for strongly observable graphs has a regret bound of $\tilde{O}( \alpha^{1/2} T^{1/2} ) $ for adversarial environments, as well as of $ {O} ( \frac{\alpha (\ln T)^3 }{\Delta_{\min}} ) $ for stochastic environments, where $\Delta_{\min}$ expresses the minimum suboptimality gap. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of $\tilde{O}( \delta^{1/3}T^{2/3} )$ for adversarial environments and poly-logarithmic regret for stochastic environments. The proposed algorithms are based on the follow-the-perturbed-leader approach combined with newly designed update rules for learning rates.
翻译:本研究用一般定向反馈图解考虑在线学习。 对于这个问题, 我们展示了最佳的双世界算法, 在对抗性环境上达到近乎紧凑的遗憾度, 以及对于随机环境的多对数的遗憾度。 正如 Alon 等人 [2015] 所显示的, 严格的遗憾度取决于反馈图的结构 :\ textit{ 非常可观的图表产生最小的遗憾 $tilde\theta} (\alpha2} 1/2} T ⁇ 1/2} 美元, 而\ textitle{weakly可观测} 图表则带来最接近的遗憾。 美元=1/3} T ⁇ /3} 微量的遗憾值。 美元=alphia 美元和 $=deltamin 美元, 表示图形的独立数和某些部分的支配数。 我们提议的强可观测性图表的偏差算法 $\ talde} (legalde} (legal decrequen) $_2} T_\\ legal deal deal deal deal deal rude rude rude ral ral ral ral (x) (x) a s (x) a s=================xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx====xxxxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx