Mobile Edge Computing (MEC) has been regarded as a promising paradigm to reduce service latency for data processing in the Internet of Things, by provisioning computing resources at the network edge. In this work, we jointly optimize the task partitioning and computational power allocation for computation offloading in a dynamic environment with multiple IoT devices and multiple edge servers. We formulate the problem as a Markov decision process with constrained hybrid action space, which cannot be well handled by existing deep reinforcement learning (DRL) algorithms. Therefore, we develop a novel Deep Reinforcement Learning called Dirichlet Deep Deterministic Policy Gradient (D3PG), which is built on Deep Deterministic Policy Gradient (DDPG) to solve the problem. The developed model can learn to solve multi-objective optimization, including maximizing the number of tasks processed before expiration and minimizing the energy cost and service latency.} More importantly, D3PG can effectively deal with constrained distribution-continuous hybrid action space, where the distribution variables are for the task partitioning and offloading, while the continuous variables are for computational frequency control. Moreover, the D3PG can address many similar issues in MEC and general reinforcement learning problems. Extensive simulation results show that the proposed D3PG outperforms the state-of-art methods.
翻译:通过在网络边缘提供计算资源,移动边缘计算(MEC)被认为是一个大有希望的范例,通过在网络边缘提供计算资源,减少在物联网互联网数据处理方面的服务延迟。在这项工作中,我们共同优化任务分配和计算能力分配,以便在多IoT装置和多边缘服务器的动态环境中计算卸载。我们把这个问题发展成一个有限制混合行动空间的Markov决策程序,这无法通过现有的深度强化学习(DRL)算法来很好地处理。因此,我们开发了一个新的深层强化学习,称为D3PG(D3PG),它建立在深层确定性政策梯度梯度梯度(DPG)上,以解决问题。开发的模型可以学会解决多目标优化,包括最大限度地增加在到期前处理过的任务数量,并尽量减少能源成本和服务弹性。}更重要的是,D3PGPG可以有效地处理有限的分配持续性混合行动空间,在那里分配变量用于任务分配和卸载,而连续变量用于计算频率控制。此外,D3GPG3的升级模型可以解决许多类似的问题。