Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data. More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor stochastic (i.i.d.). We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge. We extend existing KL regret upper bounds, which hold uniformly over target distributions, to possibly uncountable expert classes with arbitrary priors; provide the first full-information lower bounds for quantile regret on finite expert classes (which are tight); and provide an adaptively minimax optimal algorithm for the semi-adversarial paradigm that adapts to the true, unknown constraint faster, leading to uniformly improved regret bounds over existing methods.
翻译:半对抗模式(Bilodeau, Negrea, 和Roy 2020)为对抗性在线学习提供了另一种放松,方法是考虑可能既不完全对立,也不具有随机性的数据(i.d.),从而在两种模式中实现微小最佳遗憾,我们采用FTRL, 并配有另外的、新的、根对数的正规化器,这两种方法都可被解释为产生正常网格的变异器。我们把现有的KL对统一地维持在目标分布之上的上限扩大到可能无法计算的专家类别,具有任意性;为有限的专家类别(即紧凑的专家类别)提供最微小的缩缩缩缩缩算法,以适应现有最接近于真实、最难以预料的制约的硬质。