Multi-agent pursuit-evasion tasks involving intelligent targets are notoriously challenging coordination problems. In this paper, we investigate new ways to learn such coordinated behaviors of unmanned aerial vehicles (UAVs) aimed at keeping track of multiple evasive targets. Within a Multi-Agent Reinforcement Learning (MARL) framework, we specifically propose a variant of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method. Our approach addresses multi-target pursuit-evasion scenarios within non-stationary and unknown environments with random obstacles. In addition, given the critical role played by collective exploration in terms of detecting possible targets, we implement heterogeneous roles for the pursuers for enhanced exploratory actions balanced by exploitation (i.e. tracking) of previously identified targets. Our proposed role-based MADDPG algorithm is not only able to track multiple targets, but also is able to explore for possible targets by means of the proposed Voronoi-based rewarding policy. We implemented, tested and validated our approach in a simulation environment prior to deploying a real-world multi-robot system comprising of Crazyflie drones. Our results demonstrate that a multi-agent pursuit team has the ability to learn highly efficient coordinated control policies in terms of target tracking and exploration even when confronted with multiple fast evasive targets in complex environments.
翻译:包括智能目标在内的多试剂规避任务有众所周知的挑战性协调问题。在本文件中,我们调查了了解无人驾驶飞行器(无人驾驶飞行器)这种协调行为的新方法,目的是跟踪多个蒸发目标。在多主动强化学习框架内,我们特别提议了多主动深确定政策梯度(MADDPG)方法的变式。我们的方法处理的是非固定和未知环境中有随机障碍的多目标追逐-蒸发情景。此外,鉴于集体探索在发现可能的目标方面发挥着关键作用,我们为追追赶者执行不同的作用,以加强探索性行动,同时利用(即跟踪)以前确定的目标,以跟踪。我们提议的基于作用的MADDPG算法不仅能够跟踪多个目标,而且还能够通过拟议的Voronoi奖励政策来探索可能的目标。我们在模拟环境中实施、测试和验证了我们的方法,在部署由Gonesflie无人机组成的真实世界多机器人系统之前,我们的成果表明,在采用多种试探目标时,在复杂的探索环境中,多试探小组能够快速地学习快速追踪。我们的成果表明,在复杂的探索环境中,多试探险小组能够对快速进行快速追踪。</s>