Modern communication networks have become very complicated and highly dynamic, which makes them hard to model, predict and control. In this paper, we develop a novel experience-driven approach that can learn to well control a communication network from its own experience rather than an accurate mathematical model, just as a human learns a new skill (such as driving, swimming, etc). Specifically, we, for the first time, propose to leverage emerging Deep Reinforcement Learning (DRL) for enabling model-free control in communication networks; and present a novel and highly effective DRL-based control framework, DRL-TE, for a fundamental networking problem: Traffic Engineering (TE). The proposed framework maximizes a widely-used utility function by jointly learning network environment and its dynamics, and making decisions under the guidance of powerful Deep Neural Networks (DNNs). We propose two new techniques, TE-aware exploration and actor-critic-based prioritized experience replay, to optimize the general DRL framework particularly for TE. To validate and evaluate the proposed framework, we implemented it in ns-3, and tested it comprehensively with both representative and randomly generated network topologies. Extensive packet-level simulation results show that 1) compared to several widely-used baseline methods, DRL-TE significantly reduces end-to-end delay and consistently improves the network utility, while offering better or comparable throughput; 2) DRL-TE is robust to network changes; and 3) DRL-TE consistently outperforms a state-ofthe-art DRL method (for continuous control), Deep Deterministic Policy Gradient (DDPG), which, however, does not offer satisfying performance.
翻译:现代通信网络已经变得非常复杂和高度动态,这使得它们难以建模、预测和控制。在本文中,我们开发了一种新的经验驱动方法,能够从自身的经验而不是精确的数学模型中学会如何很好地控制通信网络,正如人类学会一种新的技能(如驾驶、游泳等 ) 一样。 具体地说,我们首次提议利用新兴的深强化学习(DRL),在通信网络中实现无模式控制;为一个基本网络问题(交通工程(TE),我们开发了一个创新和高效的基于DRL的DR-TE控制框架(DRL-TE)。 拟议的框架通过联合学习网络环境及其动态,并在强大的深层神经网络(DNUS)的指导下做出决策,最大限度地发挥一种广泛应用的效用功能功能功能。 我们提出了两种新的技术,即TE-aware探索和基于行为体的优先经验重新玩耍,以优化通用的DRL框架,特别是TE。 为了验证和评估拟议的框架,我们在Ns-DR-3中实施了这一框架,并全面测试了它与具有代表性和随机生成的网络表型结构,但内部的DR-L级水平的模拟结果则通过持续地改进到更精确的网络,不断改进,不断改进的网络-DRDRDRL-