Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with teammates without prior coordination mechanisms, including joint training. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents with different fixed policies to enter and leave the environment without prior notification. Our solution builds on graph neural networks to learn agent models and joint-action value models under varying team compositions. We contribute a novel action-value computation that integrates the agent model and joint-action value model to produce action-value estimates. We empirically demonstrate that our approach successfully models the effects other agents have on the learner, leading to policies that robustly adapt to dynamic team compositions and significantly outperform several alternative methods.
翻译:特设团队合作是设计自主代理的棘手问题,这种代理可以迅速适应,以便在没有事先协调机制的情况下与队友合作,包括联合培训。该领域以前的工作侧重于固定代理人数的封闭小组。在这项工作中,我们考虑开放团队,允许具有不同固定政策的代理人员不经事先通知进入和离开环境。我们的解决方案建立在图形神经网络上,学习不同团队构成的代理模式和联合行动价值模型。我们贡献了一种新的行动价值计算方法,将代理模式和联合行动价值模型结合起来,以得出行动价值估计数。我们的经验证明,我们的方法成功地模拟了其他代理人员对学习者的影响,从而导致政策能够强有力地适应动态团队构成,并大大超出几种替代方法。