In the past few years, Deep Reinforcement Learning (DRL) has become a valuable solution to automatically learn efficient resource management strategies in complex networks. In many scenarios, the learning task is performed in the Cloud, while experience samples are generated directly by edge nodes or users. Therefore, the learning task involves some data exchange which, in turn, subtracts a certain amount of transmission resources from the system. This creates a friction between the need to speed up convergence towards an effective strategy, which requires the allocation of resources to transmit learning samples, and the need to maximize the amount of resources used for data plane communication, maximizing users' Quality of Service (QoS), which requires the learning process to be efficient, i.e., minimize its overhead. In this paper, we investigate this trade-off and propose a dynamic balancing strategy between the learning and data planes, which allows the centralized learning agent to quickly converge to an efficient resource allocation strategy while minimizing the impact on QoS. Simulation results show that the proposed method outperforms static allocation methods, converging to the optimal policy (i.e., maximum efficacy and minimum overhead of the learning plane) in the long run.
翻译:在过去几年里,深强化学习(DRL)已成为在复杂网络中自动学习高效资源管理战略的宝贵解决办法,在许多情况中,学习任务是在云中进行,而经验样本则直接由边缘节点或用户生成。因此,学习任务涉及一些数据交换,这反过来又会从系统中减去一定数量的传输资源。这在以下两个方面造成了摩擦:一方面需要加快趋同,以制定有效的战略,要求分配资源,传输学习样本,另一方面需要最大限度地增加用于数据平面通信的资源数量,最大限度地提高用户的服务质量(QOS),这就要求学习过程的效率,即最大限度地减少管理费用。在本文件中,我们调查这一取舍,提出在学习与数据平面之间进行动态平衡的战略,使集中学习的代理能够迅速与有效的资源分配战略汇合,同时尽量减少对QOS的影响。模拟结果显示,拟议的方法在长期内超越了静态分配方法,与最佳政策(即学习平面的最大效能和最低管理费)相融合。