We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic evolution path of the empirical system. We prove that our architecture, with only a few reasonable assumptions about network structure, has sample guarantees bounded between those of the centralised- and independent-learning cases. We discuss how the sample guarantees of the three theoretical algorithms do not actually result in practical convergence. Accordingly, we show that in practical settings where the theoretical parameters are not observed (leading to poor estimation of the Q-function), our communication scheme significantly accelerates convergence over the independent case, without relying on the undesirable assumption of a centralised controller. We contribute several further practical enhancements to all three theoretical algorithms, allowing us to showcase their first empirical demonstrations. Our experiments confirm that we can remove several of the key theoretical assumptions of the algorithms, and display the empirical convergence benefits brought by our new networked communication. We additionally show that the networked approach has significant advantages, over both the centralised and independent alternatives, in terms of robustness to unexpected learning failures and to changes in population size.
翻译:暂无翻译