Multi-access edge computing (MEC) aims to extend cloud service to the network edge to reduce network traffic and service latency. A fundamental problem in MEC is how to efficiently offload heterogeneous tasks of mobile applications from user equipment (UE) to MEC hosts. Recently, many deep reinforcement learning (DRL) based methods have been proposed to learn offloading policies through interacting with the MEC environment that consists of UE, wireless channels, and MEC hosts. However, these methods have weak adaptability to new environments because they have low sample efficiency and need full retraining to learn updated policies for new environments. To overcome this weakness, we propose a task offloading method based on meta reinforcement learning, which can adapt fast to new environments with a small number of gradient updates and samples. We model mobile applications as Directed Acyclic Graphs (DAGs) and the offloading policy by a custom sequence-to-sequence (seq2seq) neural network. To efficiently train the seq2seq network, we propose a method that synergizes the first order approximation and clipped surrogate objective. The experimental results demonstrate that this new offloading method can reduce the latency by up to 25% compared to three baselines while being able to adapt fast to new environments.
翻译:多接入边缘计算(MEC)旨在将云服务扩展到网络边缘,以减少网络流量和服务延缓度。MEC的一个基本问题是如何高效率地卸载用户设备(UE)向MEC主机的移动应用的多种任务。最近,许多基于深度强化学习(DRL)的方法已经提出,通过与由 UE、无线频道和MEC主机组成的MEC环境进行互动,学习卸载政策。然而,这些方法对新环境的适应能力薄弱,因为它们的样本效率低,需要全面再培训以学习新环境的最新政策。为了克服这一弱点,我们提议了一个基于元强化学习的任务卸载方法,该方法可以快速适应新的环境,以少量的梯度更新和样本适应新的环境。我们模拟移动应用作为定向循环图(DAGS)和卸载政策,通过定制的顺序到后继(seq2seqeqe)网络。为了高效地培训后继2seq网络,我们提出了一种方法,可以将第一个顺序的近似和剪接的顶端目标。实验结果显示,25是能够快速调整的基线环境,同时将新的方法调整为快速调整。