The modern dynamic and heterogeneous network brings differential environments with respective state transition probability to agents, which leads to the local strategy trap problem of traditional federated reinforcement learning (FRL) based network optimization algorithm. To solve this problem, we propose a novel Differentiated Federated Reinforcement Learning (DFRL), which evolves the global policy model integration and local inference with the global policy model in traditional FRL to a collaborative learning process with parallel global trends learning and differential local policy model learning. In the DFRL, the local policy learning model is adaptively updated with the global trends model and local environment and achieves better differentiated adaptation. We evaluate the outperformance of the proposal compared with the state-of-the-art FRL in a classical CartPole game with heterogeneous environments. Furthermore, we implement the proposal in the heterogeneous Space-air-ground Integrated Network (SAGIN) for the classical traffic offloading problem in network. The simulation result shows that the proposal shows better global performance and fairness than baselines in terms of throughput, delay, and packet drop rate.
翻译:现代的动态和多样化网络带来了不同的环境,使不同的州向代理商过渡的概率不同,这导致了传统联合强化学习(FRL)基于网络优化算法的本地战略陷阱问题。为了解决这个问题,我们提议了一个新型的差别化联邦强化学习(DFRL),将全球政策模式一体化和本地推导与传统FRL的全球政策模式(FRL)形成一个合作学习过程,同时进行全球趋势学习和差异化的地方政策模式学习。在DFRL中,地方政策学习模式随着全球趋势模型和当地环境的适应性而更新,并实现更好的差异化适应性适应性。我们评估了与传统卡托普尔(CartPole)游戏中与不同环境的最新FRL(FRL)相比,提案的超水平表现。此外,我们还在多式空地空间-地面综合网络(SAGIN)中实施关于网络中典型交通卸载问题的建议。模拟结果表明,该提案在吞吐率、延迟和集包落率方面比基线表现出更好的全球业绩和公平性。