In this paper, we investigate a multi-user downlink multiple-input single-output (MISO) unmanned aerial vehicle (UAV) communication system, where a multi-antenna UAV is employed to serve multiple ground terminals. Unlike existing approaches focus only on a simplified two-dimensional scenario, this paper considers a three-dimensional (3D) urban environment, where the UAV's 3D trajectory is designed to minimize data transmission completion time subject to practical throughput and flight movement constraints. Specifically, we propose a deep reinforcement learning (DRL)-based trajectory design for completion time minimization (DRL-TDCTM), which is developed from a deep deterministic policy gradient algorithm. In particular, to represent the state information of UAV and environment, we set an additional information, i.e., the merged pheromone, as a reference of reward which facilitates the algorithm design. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can continuously and adaptively learn how to adjust the UAV's movement strategy. Finally, simulation results show the superiority of the proposed DRL-TDCTM algorithm over the conventional baseline methods.
翻译:在本文中,我们调查了一个多用户下行链接多投入单产出无人驾驶飞行器通信系统(MISO),该系统使用多亚硝氧气UAV为多个地面终端服务。与现有方法不同,它只侧重于简化的二维情景,本文审议了三维(3D)城市环境,其中UAV的3D轨迹设计旨在尽量减少数据传输完成时间,但需服从实际的吞吐和飞行移动限制。具体地说,我们建议采用基于深度强化学习(DRL-TDCTM)的轨迹设计,以完成时间最小化(DRL-TDCTM),这是从深度的确定性政策梯度算法中开发的。特别是为了代表UAV和环境的状态信息,我们设置了额外信息,即合并的光谱,作为奖励的参考,以便利算法设计。通过在相应的Markov决策过程中与外部环境的相互作用,拟议的算法可以持续和适应地学习如何调整UAV的移动战略。最后,模拟结果显示拟议的DL-TDCTM算法优于常规基线方法。