We provide the first sub-linear space and sub-linear regret algorithm for online learning with expert advice (against an oblivious adversary), addressing an open question raised recently by Srinivas, Woodruff, Xu and Zhou (STOC 2022). We also demonstrate a separation between oblivious and (strong) adaptive adversaries by proving a linear memory lower bound of any sub-linear regret algorithm against an adaptive adversary. Our algorithm is based on a novel pool selection procedure that bypasses the traditional wisdom of leader selection for online learning, and a generic reduction that transforms any weakly sub-linear regret $o(T)$ algorithm to $T^{1-\alpha}$ regret algorithm, which may be of independent interest. Our lower bound utilizes the connection of no-regret learning and equilibrium computation in zero-sum games, leading to a proof of a strong lower bound against an adaptive adversary.
翻译:我们用专家建议(针对一个被忽视的对手)为在线学习提供第一个次线性空间和次线性遗憾算法,解决了斯里尼瓦斯、伍德鲁夫、许和周(STOC 2022)最近提出的一个未决问题;我们还通过证明任何次线性遗憾算法对适应性对手的线性内存约束较低,从而证明盲目和(强大的)适应性对手之间的分离;我们的算法基于一种绕过传统领导人选择在线学习的智慧的新的池式选择程序,以及将任何微弱的亚线性遗憾(T)$的算法转换为可能具有独立兴趣的$T*1-alpha}的遗憾算法的通用缩减。我们的下线性选择法在零和游戏中使用了无关系学习和平衡计算的联系,从而证明对适应性对手的高度低约束。