This paper investigates an unmanned aerial vehicle (UAV)-assisted wireless powered mobile-edge computing (MEC) system, where the UAV powers the mobile terminals by wireless power transfer (WPT) and provides computation service for them. We aim to maximize the computation rate of terminals while ensuring fairness among them. Considering the random trajectories of mobile terminals, we propose a soft actor-critic (SAC)-based UAV trajectory planning and resource allocation (SAC-TR) algorithm, which combines off-policy and maximum entropy reinforcement learning to promote the convergence of the algorithm. We design the reward as a heterogeneous function of computation rate, fairness, and reaching of destination. Simulation results show that SAC-TR can quickly adapt to varying network environments and outperform representative benchmarks in a variety of situations.
翻译:本文调查无人驾驶飞行器(无人驾驶飞行器)辅助无线动力移动对称计算(MEC)系统,无人驾驶飞行器通过无线电转移(WPT)为移动终端提供动力并为其提供计算服务。我们的目标是最大限度地提高终端的计算率,同时确保这些终端之间的公平性。考虑到移动终端的随机轨迹,我们建议采用基于软性行为者-化学(SAC)的无人驾驶飞行器轨迹规划和资源分配算法(SAC-TR)计算法,该算法结合了离政策性强化学习和最大增温强化学习,以促进算法的趋同。我们把奖励设计为计算率、公平性和到达目的地的多种功能。模拟结果表明SAC-TR可以快速适应不同的网络环境,并在各种情况下超越有代表性的基准。