通过Lyapunov-指导的深强化学习系统卸载 (Stable Online Computation Offloading via Lyapunov-guided Deep Reinforcement Learning)

In this paper, we consider a multi-user mobile-edge computing (MEC) network with time-varying wireless channels and stochastic user task data arrivals in sequential time frames. In particular, we aim to design an online computation offloading algorithm to maximize the network data processing capability subject to the long-term data queue stability and average power constraints. The online algorithm is practical in the sense that the decisions for each time frame are made without the assumption of knowing future channel conditions and data arrivals. We formulate the problem as a multi-stage stochastic mixed integer non-linear programming (MINLP) problem that jointly determines the binary offloading (each user computes the task either locally or at the edge server) and system resource allocation decisions in sequential time frames. To address the coupling in the decisions of different time frames, we propose a novel framework, named LyDROO, that combines the advantages of Lyapunov optimization and deep reinforcement learning (DRL). Specifically, LyDROO first applies Lyapunov optimization to decouple the multi-stage stochastic MINLP into deterministic per-frame MINLP subproblems of much smaller size. Then, it integrates model-based optimization and model-free DRL to solve the per-frame MINLP problems with very low computational complexity. Simulation results show that the proposed LyDROO achieves optimal computation performance while satisfying all the long-term constraints. Besides, it induces very low execution latency that is particularly suitable for real-time implementation in fast fading environments.

翻译：在本文中,我们考虑的是多用户移动前计算机(MEC)网络,有时间变化的无线频道和随机用户任务在顺序时间框架内到达的数据。特别是,我们的目标是设计在线计算卸载算法,在长期数据队列稳定性和平均功率限制的前提下,最大限度地提高网络数据处理能力。在线算法是实用的,因为每个时间框架的决策都不假定了解未来频道条件和数据到达情况。我们把这个问题当作一个多阶段混合整型非线性编程(MINLP)问题,它共同决定双轨卸载(每个用户在本地或边缘服务器上对任务进行折载)和系统资源分配决定。为了解决不同时间框架决定中的组合,我们提议了一个新框架,将Lyapunov优化和深度加固学习(DRL)的好处结合起来。具体地说,LyDROO最先将低级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级平级。