UAV 利用经济强化学习进行高效的UAV 轨迹规划 (Efficient UAV Trajectory-Planning using Economic Reinforcement Learning)

Advances in unmanned aerial vehicle (UAV) design have opened up applications as varied as surveillance, firefighting, cellular networks, and delivery applications. Additionally, due to decreases in cost, systems employing fleets of UAVs have become popular. The uniqueness of UAVs in systems creates a novel set of trajectory or path planning and coordination problems. Environments include many more points of interest (POIs) than UAVs, with obstacles and no-fly zones. We introduce REPlanner, a novel multi-agent reinforcement learning algorithm inspired by economic transactions to distribute tasks between UAVs. This system revolves around an economic theory, in particular an auction mechanism where UAVs trade assigned POIs. We formulate the path planning problem as a multi-agent economic game, where agents can cooperate and compete for resources. We then translate the problem into a Partially Observable Markov decision process (POMDP), which is solved using a reinforcement learning (RL) model deployed on each agent. As the system computes task distributions via UAV cooperation, it is highly resilient to any change in the swarm size. Our proposed network and economic game architecture can effectively coordinate the swarm as an emergent phenomenon while maintaining the swarm's operation. Evaluation results prove that REPlanner efficiently outperforms conventional RL-based trajectory search.

翻译：无人驾驶航空器(无人驾驶航空器)设计的进展已开启了各种应用,如监视、消防、蜂窝网络和交付应用等。此外,由于成本下降,使用无人驾驶航空器车队的系统已变得流行。系统的独特性创造了一套全新的轨迹或路径规划和协调问题。环境包括比无人驾驶航空器(无人驾驶航空器)更多的利益点,有障碍和禁飞区。我们引入了REPlanner,这是一种由经济交易启发的新型多试剂强化学习算法,用于分配无人驾驶航空器之间的任务。这个系统围绕一种经济理论,特别是无人驾驶航空器交易分配的拍卖机制。我们把路径规划问题设计成一种多剂经济游戏,使代理人可以合作和竞争资源。然后我们将问题转化为一个部分可观测的马尔科夫决定程序(POMDP),该程序将采用在每一个代理人上部署的强化学习(RL)模式加以解决。随着系统通过无人驾驶飞行器合作计算任务分配情况,它具有很强的弹性,以适应任何变化。我们提议的网络和经济游戏结构能够有效地协调常规搜索结果,同时显示常规飞行模式。