以 " 学习为基础的 " 办法,用于在毫米瓦网中分配束束安排 (A Q-Learning-based Approach for Distributed Beam Scheduling in mmWave Networks)

We consider the problem of distributed downlink beam scheduling and power allocation for millimeter-Wave (mmWave) cellular networks where multiple base stations (BSs) belonging to different service operators share the same unlicensed spectrum with no central coordination or cooperation among them. Our goal is to design efficient distributed beam scheduling and power allocation algorithms such that the network-level payoff, defined as the weighted sum of the total throughput and a power penalization term, can be maximized. To this end, we propose a distributed scheduling approach to power allocation and adaptation for efficient interference management over the shared spectrum by modeling each BS as an independent Q-learning agent. As a baseline, we compare the proposed approach to the state-of-the-art non-cooperative game-based approach which was previously developed for the same problem. We conduct extensive experiments under various scenarios to verify the effect of multiple factors on the performance of both approaches. Experiment results show that the proposed approach adapts well to different interference situations by learning from experience and can achieve higher payoff than the game-based approach. The proposed approach can also be integrated into our previously developed Lyapunov stochastic optimization framework for the purpose of network utility maximization with optimality guarantee. As a result, the weights in the payoff function can be automatically and optimally determined by the virtual queue values from the sub-problems derived from the Lyapunov optimization framework.

翻译：我们考虑的是,在分布下链路光束(mmWave)蜂窝网络中,属于不同服务运营商的多个基地台站(BS)的分布式下链线的时间安排和权力分配问题,属于不同服务运营商的多个基地台站(BS)的分布式无许可证频谱,没有中央协调或相互合作。我们的目标是设计高效的分布式光束列表和权力分配算法,使网络一级报酬(定义为总吞吐量的加权总和和和权力惩罚性术语)能够最大化。为此,我们建议采用分配式列表办法分配权力分配和调整,以便在共享的频谱上进行有效的干预管理,将每个基地台站建成独立的Q学习机构。作为一个基线,我们将拟议办法与先前为同一问题制定的最先进的非合作性游戏法系比较。我们在各种情景下进行广泛的实验,以核实多种因素对这两种方法的绩效的影响。实验结果表明,拟议办法通过吸取经验,适应不同的干扰情况,并实现比游戏式方法更高的报酬管理。拟议办法也可以从我们以前开发的Lyapuncommal-hestalstall res压后,通过最佳利用网络自动确定最高比率框架。