The theory of learning in games is prominent in the AI community, motivated by several rising applications such as multi-agent reinforcement learning and Generative Adversarial Networks. We propose Mutation-driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and prove that it exhibits the last-iterate convergence property in both full- and noisy-information feedback settings. In the full-information feedback setting, the players observe their exact gradient vectors of the utility functions. On the other hand, in the noisy-information feedback setting, they can only observe the noisy gradient vectors. Existing algorithms, including the well-known Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms, fail to converge to a Nash equilibrium with noisy-information feedback. In contrast, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both of the feedback settings. We then prove that it converges to an exact Nash equilibrium by adapting the mutation term iteratively. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.
翻译:在AI社区中,游戏学习的理论在AI社区中十分突出,其动机是多个不断上升的应用,如多试剂强化学习和基因反向网络等。我们提议采用由 Mudiation 驱动的多复制光速更新(M2WU),以在双玩者零和正态游戏中学习平衡,并证明它在完整和吵闹的信息反馈设置中都表现出最后的地缘趋同属性。在全信息反馈设置中,玩家们观察着其实用功能的精确梯度矢量。另一方面,在噪音信息反馈设置中,他们只能观察噪音的梯度矢量。现有的算法,包括众所周知的多复制光速更新(MWU)和优化的MWU(OMWU)算法,未能以噪音信息反馈的方式达到纳什平衡。相比之下,M2W(MW)在回馈设置中都展示了最后的地缘趋同到接近纳什平衡的定点。我们随后证明,通过对突变的术语进行反复调整,它们会达到准确的纳什平衡。我们从经验上确认MW2U的趋同率率。