We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that smoothly interpolates between worst- and best-case regret guarantees.
翻译:当学习者与其他最优化的推动者一起参与连续游戏时,我们研究如何将遗憾最小化的问题:在这种情况下,如果所有玩家都采用无回报算法,那么就有可能实现比完全对抗环境低得多的遗憾。我们研究这个问题的背景是,游戏的不稳定性稳定(一种连续游戏,包括所有Convex-concave和单调球游戏),而且当玩家只能获得对其个人支付率梯度的噪音估计时;如果噪音是添加的,游戏理论和纯粹的对抗性环境也享有类似的遗憾保证;但是,如果噪音是多倍增的,我们表明学习者实际上能够实现不断的遗憾。我们通过一个乐观的梯度计划,通过学习率分离,即方法的外推法和更新步骤根据噪音情况,根据不同的时间安排调整不同的时间表,实现这一更快的速度。随后,为了消除对微妙的超分数调的需要,我们建议一种完全适应性的方法,在最坏和最坏的遗憾保证之间进行顺畅通的调。