Many learning problems involve multiple agents optimizing different interactive functions. In these problems, the standard policy gradient algorithms fail due to the non-stationarity of the setting and the different interests of each agent. In fact, algorithms must take into account the complex dynamics of these systems to guarantee rapid convergence towards a (local) Nash equilibrium. In this paper, we propose NOHD (Newton Optimization on Helmholtz Decomposition), a Newton-like algorithm for multi-agent learning problems based on the decomposition of the dynamics of the system in its irrotational (Potential) and solenoidal (Hamiltonian) component. This method ensures quadratic convergence in purely irrotational systems and pure solenoidal systems. Furthermore, we show that NOHD is attracted to stable fixed points in general multi-agent systems and repelled by strict saddle ones. Finally, we empirically compare the NOHD's performance with that of state-of-the-art algorithms on some bimatrix games and in a continuous Gridworld environment.
翻译:许多学习问题涉及多个代理商优化不同互动功能。 在这些问题中,标准的政策梯度算法由于每个代理商的设置和不同利益不固定而未能实现。 事实上,算法必须考虑到这些系统的复杂动态,以保证迅速接近(本地)纳什均衡。 在本文中,我们建议NOHD(Helmholtz Decomposition 上的Newton优化),一种像牛顿式的多试剂学习问题算法,这种算法基于系统动力分解在其渗透(Potential)和单向(Hamiltonian)部分中的分解。这种方法确保了纯循环系统和纯单向单向系统中的二次趋同。此外,我们表明NOHD被吸引到一般多剂系统中稳定的固定点,并被严格的马鞍击退。 最后,我们从经验上将NOHD的性能与一些双马座游戏和连续的格德世界环境中的状态算法的性比较。