The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents.
翻译:具有变异物剂框架的合作性多Agents R 增强力学习(MARL)在现实世界应用中取得了巨大的实证成功。 不幸的是,由于许多物剂的诅咒和对现有工程中关系推理的探索有限,对MARL问题的理论理解缺乏。 在本文中,我们核实变压器采用了复杂的关联推理,我们提出和分析与变压器相近的无模型和基于模型的离线MARL运算法。我们证明,无型和基于模型的算法的次优化差距分别独立于不同物剂的数量,并有逻辑性,这减轻了许多物剂的诅咒。这些结果是变压器的新的概括错误和对变压器系统动态的最大相似性估计(MLE)进行的新分析的结果。我们基于模型的算法是第一个明显利用物剂变异性的第一种精巧的MARL算法。