Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the joint Q-value representation for OAHT, from the perspective of cooperative game theory, and validate its learning paradigm. Building on our theory, we propose a novel algorithm named CIAO, compatible with GPL framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.
翻译:暂无翻译