In this work, we optimize the 3D trajectory of an unmanned aerial vehicle (UAV)-based portable access point (PAP) that provides wireless services to a set of ground nodes (GNs). Moreover, as per the Peukert effect, we consider pragmatic non-linear battery discharge for the battery of the UAV. Thus, we formulate the problem in a novel manner that represents the maximization of a fairness-based energy efficiency metric and is named fair energy efficiency (FEE). The FEE metric defines a system that lays importance on both the per-user service fairness and the energy efficiency of the PAP. The formulated problem takes the form of a non-convex problem with non-tractable constraints. To obtain a solution, we represent the problem as a Markov Decision Process (MDP) with continuous state and action spaces. Considering the complexity of the solution space, we use the twin delayed deep deterministic policy gradient (TD3) actor-critic deep reinforcement learning (DRL) framework to learn a policy that maximizes the FEE of the system. We perform two types of RL training to exhibit the effectiveness of our approach: the first (offline) approach keeps the positions of the GNs the same throughout the training phase; the second approach generalizes the learned policy to any arrangement of GNs by changing the positions of GNs after each training episode. Numerical evaluations show that neglecting the Peukert effect overestimates the air-time of the PAP and can be addressed by optimally selecting the PAP's flying speed. Moreover, the user fairness, energy efficiency, and hence the FEE value of the system can be improved by efficiently moving the PAP above the GNs. As such, we notice massive FEE improvements over baseline scenarios of up to 88.31%, 272.34%, and 318.13% for suburban, urban, and dense urban environments, respectively.
翻译:在这项工作中,我们优化了无人驾驶航空飞行器(UAV)的3D轨道,该飞行器为一组地面节点(GNs)提供无线服务。此外,根据Peukert效应,我们考虑对UAV电池采取务实的非线性电池排放。因此,我们以新颖的方式提出这一问题,它代表着公平基础上的能源效率衡量标准最大化,并被命名为公平的能源效率。FEE 标准定义了一个系统,它既重视用户服务公平性,也重视PAP的能源效率。 所提出的问题表现为非线性点问题,具有不可消除的限制。此外,根据Peukert效应,我们考虑的是实用的非线性电池电池排放。考虑到解决方案空间的复杂性,我们使用两重延迟的深度威慑性政策梯度(TD3) 行为体-点13深度强化学习(DRL)框架来学习一种能够最大限度地提高系统FEEEE的改进度的政策。我们进行了两种类型的RL培训,从而展示了我们系统在不精确性定位上的位置;为了了解每个方向,而不断更新的FAAP的汇率评估,从而展示了每个飞行阶段的节点的节点的节点的节点,从而展示了我们所学方法。