In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.
翻译:在多式网络(HetNets)中,小细胞和宏观细胞的重叠造成了严重的跨层干扰。虽然存在一些解决这一问题的方法,但通常需要全球频道状态信息,而这种信息在实践中很难获得,并获得计算复杂度高的亚最佳电力分配政策。为了克服这些限制,我们提议为HetNet建立一个基于多剂深度强化学习(MADRL)的电力控制计划,每个接入点根据当地信息独立作出权力控制决定。为了促进代理商之间的合作,我们为MADRL系统开发了基于惩罚的Q学习算法。通过在损失函数中引入正规化条件,每个代理商往往在重访某个状态时选择有高度回报的有经验的行动,从而导致更新速度放慢。这样,一个代理商的政策可以更容易地为其他代理商学习,从而导致更有效的合作进程。我们随后在所考虑的HetNet中实施拟议的PQL,并将其与其他分布式培训与执行(DTE)的QL算法(PQL)的算法。模拟结果显示,我们所拟议的PTEL变动的用户能够从动态的DQL变动的变动的变压环境学。