This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.
翻译:本文研究了一种算法,称之为“磁性镜面下降”,受到镜面下降和非欧几里得近端梯度算法的启发。我们的贡献是展示了磁性镜面下降作为平衡求解器以及在双人零和博弈中作为强化学习方法的好处。这些优点包括:1)作为第一个能够在具有一阶反馈的扩展型博弈中实现线性收敛的量化响应均衡求解器;2)作为第一个在表格设置中与 CFR 相比取得实验证明优越性的标准强化学习算法;3)作为自我对弈深度强化学习算法在 3x3 暗黑棋和奇幻井字棋中实现了有利的性能。