A wireless network operator typically divides the radio spectrum it possesses into a number of subbands. In a cellular network those subbands are then reused in many cells. To mitigate co-channel interference, a joint spectrum and power allocation problem is often formulated to maximize a sum-rate objective. The best known algorithms for solving such problems generally require instantaneous global channel state information and a centralized optimizer. In fact those algorithms have not been implemented in practice in large networks with time-varying subbands. Deep reinforcement learning algorithms are promising tools for solving complex resource management problems. A major challenge here is that spectrum allocation involves discrete subband selection, whereas power allocation involves continuous variables. In this paper, a learning framework is proposed to optimize both discrete and continuous decision variables. Specifically, two separate deep reinforcement learning algorithms are designed to be executed and trained simultaneously to maximize a joint objective. Simulation results show that the proposed scheme outperforms both the state-of-the-art fractional programming algorithm and a previous solution based on deep reinforcement learning.
翻译:无线网络操作员通常将其拥有的无线电频谱分成若干子波段。 在蜂窝网络中,这些子波段随后被重新用于许多细胞。为了减少联合通道干扰,往往会拟订一个联合频谱和电力分配问题,以最大限度地实现一个总和目标。解决这类问题最已知的算法通常需要瞬时全球频道状态的信息和一个中央优化器。事实上,这些算法没有在具有时间分布子波段的大型网络中实际应用。深层强化学习算法是解决复杂资源管理问题的有希望的工具。这里的一个主要挑战是频谱分配涉及离散子波段选择,而电力分配则涉及连续变量。在本文中,提议了一个学习框架,以优化离散和连续的决定变量。具体地说,两个单独的深度强化学习算法将同时执行和培训,以最大限度地实现一个共同目标。模拟结果显示,拟议的方案优于最先进的小段编程算法和以前基于深度强化学习的解决办法。