Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by iteratively training new policies against existing ones, growing a policy population that is robust to exploit. This iterative approach suffers from two issues in real-world games: a) under finite budget, approximate best-response operators at each iteration needs truncating, resulting in under-trained good-responses populating the population; b) repeated learning of basic skills at each iteration is wasteful and becomes intractable in the presence of increasingly strong opponents. In this work, we propose Neural Population Learning (NeuPL) as a solution to both issues. NeuPL offers convergence guarantees to a population of best-responses under mild assumptions. By representing a population of policies within a single conditional model, NeuPL enables transfer learning across policies. Empirically, we show the generality, improved performance and efficiency of NeuPL across several test domains. Most interestingly, we show that novel strategies become more accessible, not less, as the neural population expands.
翻译:在战略游戏(如StarCraft,扑克)中学习需要发现不同的政策。这通常是通过反复培训针对现有政策的新政策来实现的,这往往通过反复培训针对现有政策的新政策,使政策人口变得强大,可以加以利用。在现实世界游戏中,这种迭代方法有两个问题:(a) 在有限的预算下,每个迭代需求中大约最佳反应的操作者需要赶超,导致训练有素的良好反应不足,使人口大量涌现;(b) 每次迭代都反复学习基本技能是浪费性的,在日益强大的对手面前变得棘手。在这项工作中,我们提议将神经人口学习(NeuuPL)作为解决这两个问题的办法。NeuPL为在温和假设下最佳反应的人口提供趋同保证。通过在单一的有条件模式中代表政策组合,NeuPL使跨政策的学习得以转移。我们生动地展示了NeuPL在几个试验领域的通用性、改进性和效率。最有意思的是,我们显示随着神经人口的增长,新的战略变得更加容易获得,而不是更少。