This paper proposes an effective and novel multiagent deep reinforcement learning (MADRL)-based method for solving the joint virtual network function (VNF) placement and routing (P&R), where multiple service requests with differentiated demands are delivered at the same time. The differentiated demands of the service requests are reflected by their delay- and cost-sensitive factors. We first construct a VNF P&R problem to jointly minimize a weighted sum of service delay and resource consumption cost, which is NP-complete. Then, the joint VNF P&R problem is decoupled into two iterative subtasks: placement subtask and routing subtask. Each subtask consists of multiple concurrent parallel sequential decision processes. By invoking the deep deterministic policy gradient method and multi-agent technique, an MADRL-P&R framework is designed to perform the two subtasks. The new joint reward and internal rewards mechanism is proposed to match the goals and constraints of the placement and routing subtasks. We also propose the parameter migration-based model-retraining method to deal with changing network topologies. Corroborated by experiments, the proposed MADRL-P&R framework is superior to its alternatives in terms of service cost and delay, and offers higher flexibility for personalized service demands. The parameter migration-based model-retraining method can efficiently accelerate convergence under moderate network topology changes.
翻译:本文提出了一种有效和新型的多试剂深层强化学习(MADRL)法,用于解决联合虚拟网络功能(VNF)的定位和路径选择(P&R),同时提供多种具有不同需求的服务请求,服务请求的不同要求反映在其延迟和成本敏感因素中。我们首先构建了VNF P&R问题,以共同尽量减少服务延迟和资源消耗成本的加权总和,这是NP完成的。然后,VNF P&R问题被分解成两个迭接子任务:安置子任务和路径排列子任务。每个子任务由多个同时并行的相继决策程序组成。通过援引深度的确定性政策梯度方法和多试剂技术,一个MADR-P &R框架旨在执行两个子任务。提出了新的联合奖赏和内部奖赏机制,以适应定位和路径调整中子任务的目标和制约。我们还提出了基于移民的参数模式培训方法,用以应对不断变化的网络顶级结构结构,通过实验、高端的延迟性服务框架和高端系统化的移徙要求。