Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy which connects instances of the game defined with decaying levels of entropy regularization. To encourage iterates to remain near this path, we efficiently minimize \emph{average deviation incentive} via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. This process can also be viewed as constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, \emph{average deviation incentive descent with adaptive sampling} (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. We demonstrate through experiments the ability of this approach to approximate a Nash equilibrium in normal-form games with as many as seven players and twenty one actions (over one trillion outcomes) that are orders of magnitude larger than those possible with prior algorithms.
翻译:纳什 平衡是游戏理论中的核心概念 。 若干纳什 解决者存在, 但对于普通游戏而言, 却没有规模, 有许多动作和许多玩家, 尤其是那些付出的加仑过大, 以至于无法存储在记忆中的人。 在这项工作中, 我们提出一种方法, 通过联合玩耍, 迭接地改善纳什平衡近似 。 它通过追踪一个先前建立的同质体, 将游戏中定义的游戏实例与正在衰减的对温调节水平连接起来。 为了鼓励迭代者留在这条路径附近, 我们有效地将平均偏移激励力降到最低 。 我们通过随机梯度梯度下降, 明智地在高压室中取样取用。 这一过程也可以被视为构建和应对游戏的多元性近似。 在这些方法中, 我们提出的方法, \ emph{ / 平均偏差偏移和适应性采样仪( ADIDASS) 最接近三种古典方法, 即同质型类型、 lyapunov 和 迭合质 聚胺解解解解算。 我们通过实验来, 在正常形式游戏中, 接近纳什 平衡的能力, 接近于7个级的游戏中, 和20万亿级 的结果是 。