A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's \emph{regret} after $T$ repetitions grows polylogarithmically in $T$, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces -- such as normal-form and extensive-form games. The question as to whether $O(\text{polylog} T)$ regret bounds can be obtained for general convex and compact strategy sets -- which occur in many fundamental models in economics and multiagent systems -- while retaining efficient strategy updates is an important question. In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets. Our learning dynamics are based on an instantiation of optimistic follow-the-regularized-leader over an appropriately \emph{lifted} space using a \emph{self-concordant regularizer} that is, peculiarly, not a barrier for the feasible region. Further, our learning dynamics are efficiently implementable given access to a proximal oracle for the convex strategy set, leading to $O(\log\log T)$ per-iteration complexity; we also give extensions when access to only a \emph{linear} optimization oracle is assumed. Finally, we adapt our dynamics to guarantee $O(\sqrt{T})$ regret in the adversarial regime. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret bounds either in terms of the dependence on the number of iterations or on the dimension of the strategy sets.
翻译:最近的工作线已建立了不相交的学习动态, 这样, 当所有玩家在游戏中使用了 $( t) 的游戏时, 每个玩家在$T$的重复会以$T美元增长多方数, 相对于传统保障在无正反框架内的指数性改善。 但是, 到目前为止, 这些结果只局限于某些类型的游戏, 具有结构化的战略空间, 比如常规形式和广式游戏。 问题是, 当所有玩家在游戏中使用 $( t( text{ pollylog} T) 时, 每个玩家都会使用 $( etrex ) 和 缩略式战略组( $x ) 之后, 每个玩家都会使用 $( t) 和 缩略式战略组的 。 我们的学习动态基于一个不易变现的 Oraltial- realdealdealdeal 战略, 也就是在常规的 rodealdeal- relax 中, 在普通游戏中, 我们的常价调的游戏将使用 。