Independent learners are learning agents that naively employ single-agent learning algorithms in multi-agent systems, intentionally ignoring the effect of other strategic agents present in their environment. This paper studies $N$-player mean-field games from a decentralized learning perspective with two primary objectives: (i) to study the convergence properties of independent learners, and (ii) to identify structural properties of $N$-player mean-field games that can guide algorithm design. Toward the first objective, we study the learning iterates obtained by independent learners, and we use recent results from POMDP theory to show that these iterates converge under mild conditions. In particular, we consider four information structures corresponding to information at each agent: (1) global state + local action; (2) local state, mean-field state + local action; (3) local state, compressed mean-field state + local action; (4) local state with local action. We present a notion of subjective equilibrium suitable for the analysis of independent learners. Toward the second objective, we study a family of dynamical systems on the set of joint policies. The dynamical systems under consideration are subject to a so-called $\epsilon$-satisficing condition: agents who are subjectively $\epsilon$-best-responding at a given joint policy do not change their policy. We establish a useful structural property relating to such dynamical systems. Finally, we develop an independent learning algorithm for $N$-player mean-field games that drives play to subjective $\epsilon$-equilibrium under self-play, exploiting the aforementioned structural properties to guarantee convergence of policies. Notably, we avoid requiring agents to follow the same policy (via a representative agent) during the learning process, which has been the typical approach in the existing literature on learning for mean-field games.
翻译:独立学习者都是在多试剂系统中天真地使用单一试剂学习算法的学习代理人,有意忽略其环境中其他战略代理人的影响。本文从分散学习的角度从分散学习的角度研究美元玩家平均场游戏,主要有两个目标:(一) 研究独立学习者的趋同特性,和(二) 确定能指导算法设计的美元玩家平均场游戏的结构属性。为了第一个目标,我们研究独立学习者获得的学习循环算法,我们使用POMDP理论的最新结果来显示这些循环在温和的条件下会聚在一起。特别是,我们考虑的四种与每个代理人的信息相对的信息结构结构结构结构结构结构结构结构结构:(1) 全球州+当地行动;(2) 当地国家,简化平均游戏状态+当地行动;(4) 当地行动。我们提出一个适合分析独立学习者分析的主观平衡概念。为了第二个目标,我们研究一套动态游戏系统在联合政策组合中,我们所考虑的动态游戏游戏游戏系统会以所谓的美元递归平价标准,我们所研究的自动变平价工具,我们所研究的是,我们所研究的自动变平价政策,我们所学会的游戏动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力, 正在 正在 。