多机构导航学习图图强化指挥官-执行员 (Learning Graph-Enhanced Commander-Executor for Multi-Agent Navigation)

This paper investigates the multi-agent navigation problem, which requires multiple agents to reach the target goals in a limited time. Multi-agent reinforcement learning (MARL) has shown promising results for solving this issue. However, it is inefficient for MARL to directly explore the (nearly) optimal policy in the large search space, which is exacerbated as the agent number increases (e.g., 10+ agents) or the environment is more complex (e.g., 3D simulator). Goal-conditioned hierarchical reinforcement learning (HRL) provides a promising direction to tackle this challenge by introducing a hierarchical structure to decompose the search space, where the low-level policy predicts primitive actions in the guidance of the goals derived from the high-level policy. In this paper, we propose Multi-Agent Graph-Enhanced Commander-Executor (MAGE-X), a graph-based goal-conditioned hierarchical method for multi-agent navigation tasks. MAGE-X comprises a high-level Goal Commander and a low-level Action Executor. The Goal Commander predicts the probability distribution of goals and leverages them to assign each agent the most appropriate final target. The Action Executor utilizes graph neural networks (GNN) to construct a subgraph for each agent that only contains crucial partners to improve cooperation. Additionally, the Goal Encoder in the Action Executor captures the relationship between the agent and the designated goal to encourage the agent to reach the final target. The results show that MAGE-X outperforms the state-of-the-art MARL baselines with a 100% success rate with only 3 million training steps in multi-agent particle environments (MPE) with 50 agents, and at least a 12% higher success rate and 2x higher data efficiency in a more complicated quadrotor 3D navigation task.

翻译：本文调查多试剂导航问题, 多试剂导航问题需要多个代理商在有限的时间内达到目标。多试剂加固学习(MARL) 已经展示了解决这一问题的有希望的成果。但是, MARL 在大型搜索空间直接探索(近距离)最佳政策是效率低下的, 而在大型搜索空间, 多试剂数量增加(例如, 10+代理商) 或环境更加复杂( 例如, 3D 模拟器), 多试剂导航任务以图表为基础的有目标限制的等级方法( 3D 模拟器) 。受目标限制的级加固学习( HRLL) 提供了一个有希望的方向来应对这一挑战, 采用等级结构来拆解搜索空间, 低级政策预测在指导源自高级政策的目标时采取原始行动。然而, 多试剂加固最佳最佳最佳最佳政策( MAGE- X ), 以图表为图表制成一个以图表为基础的系统级级级级级级的级级级级加码, 。 MAG 只能用每级的级级的级级的级级级级级升级的级目标和级的级升级的级级,, 级级级加级的级的级级的级的级的级级级级的级级级级的级级级级级级的级的级级级级的级级级级级级级,, 级, 级, 级的级的级, 级的级, 级, 级级级级级的级的级的级的级的级的级的级的级级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级级的级的级的级的级的级的级, 级, 级, 级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级级级级级级的级的级的级的级级的级的级的级