Communication load balancing aims to balance the load between different available resources, and thus improve the quality of service for network systems. After formulating the load balancing (LB) as a Markov decision process problem, reinforcement learning (RL) has recently proven effective in addressing the LB problem. To leverage the benefits of classical RL for load balancing, however, we need an explicit reward definition. Engineering this reward function is challenging, because it involves the need for expert knowledge and there lacks a general consensus on the form of an optimal reward function. In this work, we tackle the communication load balancing problem from an inverse reinforcement learning (IRL) approach. To the best of our knowledge, this is the first time IRL has been successfully applied in the field of communication load balancing. Specifically, first, we infer a reward function from a set of demonstrations, and then learn a reinforcement learning load balancing policy with the inferred reward function. Compared to classical RL-based solution, the proposed solution can be more general and more suitable for real-world scenarios. Experimental evaluations implemented on different simulated traffic scenarios have shown our method to be effective and better than other baselines by a considerable margin.
翻译:通信负载均衡旨在平衡不同可用资源之间的负载,从而提高网络系统的服务质量。将负载均衡(LB)问题形式化为马尔可夫决策过程问题后,强化学习(RL)最近被证明在解决LB问题方面非常有效。然而,为了利用传统RL的优点来实现负载平衡,我们需要明确定义奖励。工程这个奖励函数具有挑战性,因为它涉及到专家知识的需求,而且缺乏关于最优奖励函数形式的共识。在这项工作中,我们从逆向强化学习(IRL)的角度来解决通信负载平衡问题。据我们所知,这是IRL首次成功应用于通信负载平衡领域。具体而言,首先,我们从一组演示中推断出奖励函数,然后利用推断出的奖励函数学习一个强化学习负载平衡策略。与传统RL解决方案相比,所提出的解决方案可以更加通用,更适合实际场景。在不同的模拟流量场景下进行的实验评估表明,我们的方法非常有效,比其他基线方法优势明显。