In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the number of agents (population size) in mean-field games, an agent-centric perspective in contrast to the existing works focusing typically on the convergence of the empirical distribution of the population. To this end, the premise is to obtain the optimal policies of a set of finite-agent games with different population sizes. However, either deriving the closed-form solution for each game is theoretically intractable, training a distinct policy for each game is computationally intensive, or directly applying the policy trained in a game to other games is sub-optimal. We address these challenges through the Population-size-Aware Policy Optimization (PAPO). Our contributions are three-fold. First, to efficiently generate efficient policies for games with different population sizes, we propose PAPO, which unifies two natural options (augmentation and hypernetwork) and achieves significantly better performance. PAPO consists of three components: i) the population-size encoding which transforms the original value of population size to an equivalent encoding to avoid training collapse, ii) a hypernetwork to generate a distinct policy for each game conditioned on the population size, and iii) the population size as an additional input to the generated policy. Next, we construct a multi-task-based training procedure to efficiently train the neural networks of PAPO by sampling data from multiple games with different population sizes. Finally, extensive experiments on multiple environments show the significant superiority of PAPO over baselines, and the analysis of the evolution of the generated policies further deepens our understanding of the two fields of finite-agent and infinite-agent games.
翻译:在这项工作中,我们试图将有限试剂和无限试剂游戏这两个领域连接起来,方法是研究代理人的最佳政策如何与平均场游戏中的代理人数目(人口规模)形成最佳政策,这是与一般侧重于人口经验分布趋同的现有工作相比,一种以代理人为中心的观点;为此,前提是获得一套人口规模不同的有限试剂游戏的最佳政策。然而,为每场游戏制定封闭式的游戏解决方案要么在理论上是难以解决的,为每场游戏培训一种不同的变异式游戏制定一种不同的政策是计算密集的,或者直接将游戏中训练过的策略应用于其他游戏中的代理人数目(人口规模)是次优的。我们通过人口规模政策优化(PAPO)来应对这些挑战。首先,为了高效地为不同人口规模的游戏制定有效的政策,我们建议PAPO将两种自然选项(增强和超网络)统一起来,并实现显著的更好业绩。 PAPO由三个部分组成:一是人口规模的编码,将最初的人口规模值转化为相等的人口规模的多面游戏的网络,我们通过不同层次的政策网络来应对这些挑战。