MAMRL:利用广域网交通工程多剂元加强学习 (MAMRL: Exploiting Multi-agent Meta Reinforcement Learning in WAN Traffic Engineering)

Traffic optimization challenges, such as load balancing, flow scheduling, and improving packet delivery time, are difficult online decision-making problems in wide area networks (WAN). Complex heuristics are needed for instance to find optimal paths that improve packet delivery time and minimize interruptions which may be caused by link failures or congestion. The recent success of reinforcement learning (RL) algorithms can provide useful solutions to build better robust systems that learn from experience in model-free settings. In this work, we consider a path optimization problem, specifically for packet routing, in large complex networks. We develop and evaluate a model-free approach, applying multi-agent meta reinforcement learning (MAMRL) that can determine the next-hop of each packet to get it delivered to its destination with minimum time overall. Specifically, we propose to leverage and compare deep policy optimization RL algorithms for enabling distributed model-free control in communication networks and present a novel meta-learning-based framework, MAMRL, for enabling quick adaptation to topology changes. To evaluate the proposed framework, we simulate with various WAN topologies. Our extensive packet-level simulation results show that compared to classical shortest path and traditional reinforcement learning approaches, MAMRL significantly reduces the average packet delivery time even when network demand increases; and compared to a non-meta deep policy optimization algorithm, our results show the reduction of packet loss in much fewer episodes when link failures occur while offering comparable average packet delivery time.

翻译：交通最优化的挑战,如负载平衡、流量调度以及改进包装交付时间等,是广域网(广域网)难以在线决策的问题。例如,需要复杂的超常学,以找到最佳途径,改善包装交付时间,尽量减少因连接故障或拥堵而可能造成的中断。最近加强学习算法的成功可以提供有益的解决办法,以建立更好的稳健系统,从无模式环境中的经验中学习。在这项工作中,我们考虑一个道路最优化问题,特别是在大型复杂网络中包路由。我们开发并评价一种不设模式的办法,采用多试剂元强化学习(MAMRL),以便确定每包的下一个希望,以最短的时间将其交付到目的地。具体地说,我们提议利用和比较深度政策最优化RL算算法,以便在通信网络中进行分散的无模式控制,并提出一个新的基于学习模式的框架,MAMRLL,以便能够快速适应地形变化。我们用多种广的网络结构模拟。我们广泛的包级模拟结果显示,与传统的最短路段相比,在传统的改进的交付方式上,比传统的升级的系统更慢的交付方式显示我们的平均交付成本的交付方式,同时显示我们的平均递减压。