Fast and efficient transport protocols are the foundation of an increasingly distributed world. The burden of continuously delivering improved communication performance to support next-generation applications and services, combined with the increasing heterogeneity of systems and network technologies, has promoted the design of Congestion Control (CC) algorithms that perform well under specific environments. The challenge of designing a generic CC algorithm that can adapt to a broad range of scenarios is still an open research question. To tackle this challenge, we propose to apply a novel Reinforcement Learning (RL) approach. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return and models the learning process as an infinite-horizon task. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch that researchers have encountered when applying RL to CC. We evaluated our solution on the task of file transfer and compared it to TCP Cubic. While further research is required, results have shown that MARLIN can achieve comparable results to TCP with little hyperparameter tuning, in a task significantly different from its training setting. Therefore, we believe that our work represents a promising first step toward building CC algorithms based on the maximum entropy RL framework.
翻译:快速和高效的运输协议是日益分布的世界的基础。 持续提供改进通信绩效以支持下一代应用和服务的负担,加上系统和网络技术日益多样化,促进了在特定环境中运行良好的控制算法的设计; 设计通用的CC算法的挑战仍然是一个开放的研究问题。 为了应对这一挑战,我们提议采用新的强化学习方法。 我们的解决方案MARLIN使用软Acor-Critical算法来最大限度地扩大聚合率和回报率,并模拟学习过程,将其作为一项无限偏差的任务。 我们培训MARLIN在一个真实的网络上,其背景交通模式各异,以克服研究人员在应用RL到CC时遇到的模拟到真实的不匹配。 我们评估了我们关于文件传输任务的解决办法,并将其与TCP Cubic比较。 虽然需要进行进一步的研究,但结果显示,MARLIN能够取得与TCP相似的结果,在建设与培训框架最有希望的MRCL框架相比,在建设最有希望的CC框架方面迈出了一步。 因此,我们认为我们的工作代表着我们的工作。