Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It conducts a graph-based value factorization and induces explicit coordination among agents to complete complicated tasks. However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this systematic hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the accuracy and the computational efficiency of collaborated action selection. SOP-CG employs dynamic graph topology to ensure sufficient value function expressiveness. The graph selection is unified into an end-to-end learning paradigm. In experiments, we show that our approach learns succinct and well-adapted graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks.
翻译:协调图是多试剂强化学习中示范代理人协作的一种很有希望的方法,它以图表为基础进行价值系数化,并促使代理人之间明确协调,以完成复杂的任务;然而,这一范例中的一个关键挑战是,在因素化价值方面贪婪的行动选择的复杂性。它指的是分散的限制优化问题,而这种限制优化是NP-硬性的问题,而其恒定的准点近似是NP-硬性的问题。为了绕过这种系统性的硬性,本文件建议采用一种新颖的方法,即自组织多时协调图(SOP-CG),它使用结构化的图表类,以保证合作行动选择的准确性和计算效率。SOP-CG使用动态图表表层学,以确保足够的价值功能的表达性。图表选择是统一的,形成一个端到端学习模式。在实验中,我们表明我们的方法学习了简洁和完善的图表表型,引出有效的协调,并改进了各种合作多试剂任务的业绩。