With the continuous growth in communication network complexity and traffic volume, communication load balancing solutions are receiving increasing attention. Specifically, reinforcement learning (RL)-based methods have shown impressive performance compared with traditional rule-based methods. However, standard RL methods generally require an enormous amount of data to train, and generalize poorly to scenarios that are not encountered during training. We propose a policy reuse framework in which a policy selector chooses the most suitable pre-trained RL policy to execute based on the current traffic condition. Our method hinges on a policy bank composed of policies trained on a diverse set of traffic scenarios. When deploying to an unknown traffic scenario, we select a policy from the policy bank based on the similarity between the previous-day traffic of the current scenario and the traffic observed during training. Experiments demonstrate that this framework can outperform classical and adaptive rule-based methods by a large margin.
翻译:随着通信网络复杂性和流量量的不断增长,通信负载平衡解决方案引起了越来越多的关注。具体而言,与传统的基于规则的方法相比,基于强化学习(RL)的方法表现出了卓越的性能。然而,标准的RL方法通常需要大量的数据进行训练,并对在训练过程中没有遇到过的场景生成效果较差。因此,我们提出了一种策略重用框架,在其中策略选择器基于当前的流量条件选择最合适的预训练RL策略进行执行。我们的方法依赖于策略库,该库由在各种流量场景下训练的策略组成。在部署到未知交通场景时,我们根据当前场景的前一天的流量和训练期间观察到的流量之间的相似性,从策略库中选择策略。实验表明,该框架可以大幅度优于传统和自适应基于规则的方法。