TDFD 控制用户中心动态TDFD控制的分权分权联邦强化学习 (Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control)

The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.

翻译：动态和多样化数据传输的爆炸性增长给5G和移动网络以外的5G数据传输带来了巨大的挑战。为了提高网络能力和可靠性,我们提议了一个基于学习的动态时频分解(D-TDFD)计划,该计划将基站的上链和下链时间频资源根据适应性地分配,以满足不对称和差异性交通需求,同时减轻细胞间干扰。我们将这一问题发展成一个分散化的、部分可观测的Markov决策程序(Dec-POMDP),在用户的组合下降比率限制下最大限度地实现长期预期总和率。为了以分散的方式联合优化全球资源,我们提议了一个名为“F-T-TDDFDD”的联动式强化学习算法(RL),即名为“FWDDPG 深度确定性政策梯度(FWDDPG)”算法。BS通过RL算法决定其本地时间频度配置,并通过在分散化的联邦化学习框架下与邻居交换当地RL模型,实现全球培训。具体来说,为了处理每一个BS的大规模离散行动空间定位空间定位空间定位空间定位空间定位空间定位空间定位空间定位空间定位,我们采用了一种基于DDPG的系统至SAL级算法,以生成至SAL级算法,在连续空间定位上的行动,以持续地显示空间定位行动至SMA结果的连续定位速度上的动作。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日