Multipath TCP (MPTCP) has emerged as a facilitator for harnessing and pooling available bandwidth in wireless/wireline communication networks and in data centers. Existing implementations of MPTCP such as, Linked Increase Algorithm (LIA), Opportunistic LIA (OLIA) and BAlanced LInked Adaptation (BALIA) include separate algorithms for congestion control and packet scheduling, with pre-selected control parameters. We propose a Deep Q-Learning (DQL) based framework for joint congestion control and packet scheduling for MPTCP. At the heart of the solution is an intelligent agent for interface, learning and actuation, which learns from experience optimal congestion control and scheduling mechanism using DQL techniques with policy gradients. We provide a rigorous stability analysis of system dynamics which provides important practical design insights. In addition, the proposed DQL-MPTCP algorithm utilizes the `recurrent neural network' and integrates it with `long short-term memory' for continuously i) learning dynamic behavior of subflows (paths) and ii) responding promptly to their behavior using prioritized experience replay. With extensive emulations, we show that the proposed DQL-based MPTCP algorithm outperforms MPTCP LIA, OLIA and BALIA algorithms. Moreover, the DQL-MPTCP algorithm is robust to time-varying network characteristics, and provides dynamic exploration and exploitation of paths.
翻译:多路TCP(MPTCP)已成为在无线/网络通信网络和数据中心使用和集中现有带宽的促进者,目前实施MPTPCP(LIA)、机会性LIA(OLIA)和BAlanced LInked适应(BALIA)等MPTCP(MPTCP),包括了分别用于控制拥堵和包装时间安排的算法,并附有预先选定的控制参数。我们提议了一个基于深QL学习(DQL)的框架,用于无线/线通信网络和数据中心的联合拥堵控制(DQL) 。解决方案的核心是界面、学习和动作的智能剂,它利用政策梯度的DQL技术,从最优化的拥堵控制经验和时间安排机制中学习。我们对系统动态进行严格的稳定分析,提供重要的实用设计见解。此外,拟议的DQL-MPTCP算法利用“经常性神经网络”并将其与“基于短期的动态记忆”结合起来,用于不断学习子流动态动作(路径)和DLIA快速应对其行为,我们利用优先经验重新展示其网络。