Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents and effectively address open teams, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the joint Q-value representation from the perspective of cooperative game theory, and validate its learning paradigm in open team settings. Building on our theory, we propose a novel algorithm named CIAO compatible with GPL framework, with additional provable implementation tricks that can facilitate learning. The demo of experiments is available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.
翻译:暂无翻译