In real-world multi-agent systems, agents with different capabilities may join or leave without altering the team's overarching goals. Coordinating teams with such dynamic composition is challenging: the optimal team strategy varies with the composition. We propose COPA, a coach-player framework to tackle this problem. We assume the coach has a global view of the environment and coordinates the players, who only have partial views, by distributing individual strategies. Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players. We validate our methods on a resource collection task, a rescue game, and the StarCraft micromanagement tasks. We demonstrate zero-shot generalization to new team compositions. Our method achieves comparable or better performance than the setting where all players have a full view of the environment. Moreover, we see that the performance remains high even when the coach communicates as little as 13% of the time using the adaptive communication strategy.
翻译:在现实世界的多试剂系统中,具有不同能力的代理人可以加入或离开,而不会改变团队的总目标。具有这种动态构成的协调小组具有挑战性:最佳团队战略随构成的不同而变化。我们建议由COPA(教练-教练-球员框架)来解决这个问题。我们假定教练对环境有一个全球观,并通过分配个别战略来协调球员,他们只有部分观点。具体地说,我们(1)对教练和球员采取关注机制;(2)提出使学习正规化的变异目标;(3)设计适应性通信方法,让教练决定何时与球员沟通。我们验证了资源收集任务、救援游戏和StarCraft微管理任务的方法。我们对新球员构成进行零速分辨。我们的方法比所有球员都全面观察环境的场景色取得相似或更好的业绩。此外,我们看到,即使教练在使用适应性通信战略时只有13%的时间进行沟通,业绩仍然很高。