Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. The efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is then explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to mutual learning algorithms; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.
翻译:无人机(Unmanned Aerial Vehicles, UAVs) 被越来越广泛地用作提供临时通信基础设施的空中基站。本文在之前关于考虑静态节点、2D轨迹或单个无人机系统的研究基础上,专注于使用多个无人机为移动用户提供无地基通信基础设施。具体地,我们共同优化无人机的3D轨迹和非正交多址(NOMA)功率分配,以最大化系统吞吐量。首先,使用基于加权k均值聚类算法在定期间隔内建立无人机-用户关联。然后探索训练具有动作遮蔽的新型共享深度Q网络(Shared Deep Q-Network, SDQN)的有效性。与使用DQN分别训练每个无人机不同,SDQN通过使用多个智能体的经验而不是单个代理来减少训练时间。我们还展示了SDQN可以用于训练具有不同动作空间的多智能体系统。仿真结果证实:1)使用共享DQN训练在最大系统吞吐量(+20%)和训练时间(-10%)方面优于传统的DQN;2)它可以收敛于具有不同动作空间的智能体,相比相互学习算法,吞吐量增加了9%; 3)将NOMA与SDQN架构相结合,使网络与现有基准方案相比获得更好的总速率。