Multi-UAV collision avoidance is a challenging task for UAV swarm applications due to the need of tight cooperation among swarm members for collision-free path planning. Centralized Training with Decentralized Execution (CTDE) in Multi-Agent Reinforcement Learning is a promising method for multi-UAV collision avoidance, in which the key challenge is to effectively learn decentralized policies that can maximize a global reward cooperatively. We propose a new multi-agent critic-actor learning scheme called MACA for UAV swarm collision avoidance. MACA uses a centralized critic to maximize the discounted global reward that considers both safety and energy efficiency, and an actor per UAV to find decentralized policies to avoid collisions. To solve the credit assignment problem in CTDE, we design a counterfactual baseline that marginalizes both an agent's state and action, enabling to evaluate the importance of an agent in the joint observation-action space. To train and evaluate MACA, we design our own simulation environment MACAEnv to closely mimic the realistic behaviors of a UAV swarm. Simulation results show that MACA achieves more than 16% higher average reward than two state-of-the-art MARL algorithms and reduces failure rate by 90% and response time by over 99% compared to a conventional UAV swarm collision avoidance algorithm in all test scenarios.
翻译:多无人驾驶航空器避免碰撞对于无人驾驶航空器群群落应用来说是一项艰巨的任务,因为需要群群成员密切合作进行无碰撞路径规划。多机构加强学习中集中化执行培训(CTDE)是多机构加强学习中集中化执行培训(CTDE)是多机构加强学习避免碰撞的一个很有希望的方法,其中的关键挑战是有效学习分散化政策,以最大限度地实现全球奖励合作的方式。我们提议了一个新的多试剂批评家和驾驶员学习计划,称为MACA, 以避免UAV的群群群群群碰撞。MACA利用一个集中的批评家,以最大限度地利用考虑到安全和能源效率的折扣全球奖励,以及每个无人驾驶飞行器的演员寻找分散化政策避免碰撞。为了解决多机构强化学习中的分散执行培训问题,我们设计了一个反事实基线,将一个代理商的状况和行动边缘化,从而能够评价一个代理人在联合观察-行动空间中的重要性。为了训练和评价MACA,我们设计了自己的模拟环境,以近似地模拟UAV温的实际情况行为。模拟结果显示,MACA公司在超过19%的常规的预估测算中取得了比20的失败,比MACARC的预估测算法率降低了16比19的失败,比20摄氏90比20比标准的压率比20。