多目标自组织追踪中的部分可观察马尔可夫博弈的研究 (Toward multi-target self-organizing pursuit in a partially observable Markov game)

The multiple-target self-organizing pursuit (SOP) problem has wide applications and has been considered a challenging self-organization game for distributed systems, in which intelligent agents cooperatively pursue multiple dynamic targets with partial observations. This work proposes a framework for decentralized multi-agent systems to improve the implicit coordination capabilities in search and pursuit. We model a self-organizing system as a partially observable Markov game (POMG) featured by large-scale, decentralization, partial observation, and noncommunication. The proposed distributed algorithm: fuzzy self-organizing cooperative coevolution (FSC2) is then leveraged to resolve the three challenges in multi-target SOP: distributed self-organizing search (SOS), distributed task allocation, and distributed single-target pursuit. FSC2 includes a coordinated multi-agent deep reinforcement learning (MARL) method that enables homogeneous agents to learn natural SOS patterns. Additionally, we propose a fuzzy-based distributed task allocation method, which locally decomposes multi-target SOP into several single-target pursuit problems. The cooperative coevolution principle is employed to coordinate distributed pursuers for each single-target pursuit problem. Therefore, the uncertainties of inherent partial observation and distributed decision-making in the POMG can be alleviated. The experimental results demonstrate that by decomposing the SOP task, FSC2 achieves superior performance compared with other implicit coordination policies fully trained by general MARL algorithms. The scalability of FSC2 is proved that up to 2048 FSC2 agents perform efficient multi-target SOP with almost 100 percent capture rates. Empirical analyses and ablation studies verify the interpretability, rationality, and effectiveness of component algorithms in FSC2.

翻译：多目标自组织追踪 (SOP) 问题具有广泛的应用，是分布式系统中具有挑战性的自组织游戏，在该游戏中，智能代理协作地追踪多个动态目标，但只能进行局部观察。本文提出了一个分布式多智能体系统的框架，以提高搜索和追踪中的隐式协调能力。我们将自组织系统建模为部分可观察马尔可夫博弈 (POMG)，该博弈具有分布式、局部观察和非通信等特点。然后，利用模糊自组织协同共进化 (FSC2) 分布式算法来解决多目标 SOP 中的三个困难：分布式自组织搜索 (SOS)、分布式任务分配和分布式单目标追踪。FSC2 包含一种协同多智能体深度强化学习 (MARL) 方法，使同质智能体学习自然的 SOS 模式。此外，我们提出一种基于模糊的分布式任务分配方法，将多目标 SOP 局部分解为多个单目标追踪问题。协同共进化原则被用来协调每个单目标追踪问题的分布式追踪器。因此，可以缓解 POMG 中固有的局部观察和分布式决策制定的不确定性。实验结果表明，通过分解 SOP 任务，FSC2 取得了比其他通过一般 MARL 算法完全训练的隐式协调策略更优的表现。FSC2 的可扩展性已经证明，最多有 2048 个 FSC2 代理可实现高效的多目标 SOP，捕获率接近 100%。经验分析和消融研究验证了 FSC2 组件算法的可解释性、合理性和有效性。