Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria.
翻译:在多试剂学习方面,最近出现了一系列的算法,这些算法围绕着以人口为基础的培训方法PSRO, 显示了与Nash、相关和粗粗的corrate晚期平衡的趋同。值得注意的是,当代理商的数量增加时,学习最佳的回答就变得极为困难,并因此成为 ham-pers PSRO 培训方法。当认为游戏是匿名对称游戏时,中场游戏的范式就是一种无症状的解决问题。不幸的是,中场近距离近距离教育非线性使得PSRO无法直接适应。在优化和对抗性遗憾最小化的基础上,本文绕过这个问题,并引入了中场PSRO的加固性PSRO,这是PSRO在中学习纳什、粗度相关和关联的等离差调方法。关键是要用新定义的中场无对抗性对立性学习者,或者黑盒式调整来取代精确的分布步骤。我们用优化和对准度来建立最直接的PSRO。我们用最弱的硬度和最强的硬的游戏来比较它的速度和最稳的硬的硬的硬的硬的硬度,然后用机级的硬的硬的硬的硬度的硬度方法来展示。