Independent learners are agents that employ single-agent algorithms in multi-agent systems, intentionally ignoring the effect of other strategic agents. This paper studies mean-field games from a decentralized learning perspective, with two primary objectives: (i) to identify structure that can guide algorithm design, and (ii) to understand the emergent behaviour in systems of independent learners. We study a new model of partially observed mean-field games with finitely many players, local action observability, and a general observation channel for partial observations of the global state. Specific observation channels considered include (a) global observability, (b) local and mean-field observability, (c) local and compressed mean-field observability, and (d) only local observability. We establish conditions under which the control problem of a given agent is equivalent to a fully observed MDP, as well as conditions under which the control problem is equivalent only to a POMDP. Building on the connection to MDPs, we prove the existence of perfect equilibrium among memoryless stationary policies under mean-field observability. Leveraging the connection to POMDPs, we prove convergence of learning iterates obtained by independent learning agents under any of the aforementioned observation channels. We interpret the limiting values as subjective value functions, which an agent believes to be relevant to its control problem. These subjective value functions are then used to propose subjective Q-equilibrium, a new solution concept for partially observed n-player mean-field games, whose existence is proved under mean-field or global observability.We provide a decentralized learning algorithm for partially observed n-player mean-field games, and we show that it drives play to subjective Q-equilibrium by adapting the recently developed theory of satisficing paths to allow for subjectivity.
翻译:暂无翻译