This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.
翻译:这项工作研究一种算法,我们称之为磁镜下降,这种算法的灵感来自镜子下行和非欧洲极速梯度算法。我们的贡献是展示磁镜下行作为平衡解决器和作为双玩者零和游戏中强化学习的一种方法的优点。这些优点包括:(1) 成为第一个四级反应的等离子解算法,以获得第一级反馈,实现广泛形式游戏线性趋同;(2) 成为第一个标准强化学习算法,在表格设置中与CFR取得经验性竞争结果;(3) 实现3x3 黑暗赫克斯和Phantom Tic-Tac-Toe的有利性表现,作为自我游戏深度强化学习算法。</s>