As deep reinforcement learning (RL) showcases its strengths in networking and systems, its pitfalls also come to the public's attention--when trained to handle a wide range of network workloads and previously unseen deployment environments, RL policies often manifest suboptimal performance and poor generalizability. To tackle these problems, we present Genet, a new training framework for learning better RL-based network adaptation algorithms. Genet is built on the concept of curriculum learning, which has proved effective against similar issues in other domains where RL is extensively employed. At a high level, curriculum learning gradually presents more difficult environments to the training, rather than choosing them randomly, so that the current RL model can make meaningful progress in training. However, applying curriculum learning in networking is challenging because it remains unknown how to measure the "difficulty" of a network environment. Instead of relying on handcrafted heuristics to determine the environment's difficulty level, our insight is to utilize traditional rule-based (non-RL) baselines: If the current RL model performs significantly worse in a network environment than the baselines, then the model's potential to improve when further trained in this environment is substantial. Therefore, Genet automatically searches for the environments where the current model falls significantly behind a traditional baseline scheme and iteratively promotes these environments as the training progresses. Through evaluating Genet on three use cases--adaptive video streaming, congestion control, and load balancing, we show that Genet produces RL policies which outperform both regularly trained RL policies and traditional baselines in each context, not only under synthetic workloads but also in real environments.
翻译:深加学习( RL) 展示了其在网络和系统中的优势, 其陷阱也引起了公众的注意。 当经过培训处理广泛的网络工作量和先前看不见的部署环境时, 课程学习会给培训带来更多的困难环境, 使得目前的 RL 模式能够在培训中取得有意义的进展。 然而, 在网络中应用课程学习却具有挑战性, 因为对于如何测量网络环境的“ 难度”, 我们仍不清楚。 我们所介绍的Genet 是一个学习新培训框架, 用于学习更好的基于RL 的网络适应算法。 基因是建立在课程学习的概念之上的, 事实证明, 相对于广泛使用RL 的其他领域的类似问题来说, 已经证明是有效的。 在高层次上, 课程学习会给培训带来更多的困难环境, 而不是随机随机选择, 因此当前的 R 模式模式在每一个经过培训的 R 模式环境下, 都会在不断改进。