This paper studies the trajectory control and task offloading (TCTO) problem in an unmanned aerial vehicle (UAV)-assisted mobile edge computing system, where a UAV flies along a planned trajectory to collect computation tasks from smart devices (SDs). We consider a scenario that SDs are not directly connected by the base station (BS) and the UAV has two roles to play: MEC server or wireless relay. The UAV makes task offloading decisions online, in which the collected tasks can be executed locally on the UAV or offloaded to the BS for remote processing. The TCTO problem involves multi-objective optimization as its objectives are to minimize the task delay and the UAV's energy consumption, and maximize the number of tasks collected by the UAV, simultaneously. This problem is challenging because the three objectives conflict with each other. The existing reinforcement learning (RL) algorithms, either single-objective RLs or single-policy multi-objective RLs, cannot well address the problem since they cannot output multiple policies for various preferences (i.e. weights) across objectives in a single run. This paper adapts the evolutionary multi-objective RL (EMORL), a multi-policy multi-objective RL, to the TCTO problem. This algorithm can output multiple optimal policies in just one run, each optimizing a certain preference. The simulation results demonstrate that the proposed algorithm can obtain more excellent nondominated policies by striking a balance between the three objectives regarding policy quality, compared with two evolutionary and two multi-policy RL algorithms.
翻译:本文研究无人驾驶飞行器(无人驾驶飞行器)辅助移动边缘计算系统中的轨迹控制和任务卸载(TCTO)问题,无人驾驶飞行器(UAV)辅助移动边缘计算系统中的轨迹控制和任务卸载(TCTO)问题,无人驾驶飞行器沿着计划轨迹飞行,从智能设备(SDs)收集计算任务。我们认为,基地站(BS)和无人驾驶飞行器没有直接连接SDS(UAV)的轨道控制和任务卸载(TCTO)问题有两个作用:MEC服务器或无线中继中继。无人驾驶飞行器在网上做任务卸载决定,收集的任务卸载(TCTO)问题可以在当地执行,也可以卸载到BS(BS)远程处理。 技合问题涉及多目标的多目标优化,因为其目标是最大限度地减少任务延迟和无人驾驶飞行器的能源消耗量,同时最大限度地增加无人驾驶飞行器所收集的任务数量。 这个问题具有挑战性,因为三个目标相互冲突。 现有的强化学习(RL)算法,无论是单一目的的RL还是单一政策,都无法解决问题,因为它们不能产生多重政策(equal-L),因此无法提出多项优先政策。