This paper presents a novel and effective deep reinforcement learning (DRL)-based approach to addressing joint resource management (JRM) in a practical multi-carrier non-orthogonal multiple access (MC-NOMA) system, where hardware sensitivity and imperfect successive interference cancellation (SIC) are considered. We first formulate the JRM problem to maximize the weighted-sum system throughput. Then, the JRM problem is decoupled into two iterative subtasks: subcarrier assignment (SA, including user grouping) and power allocation (PA). Each subtask is a sequential decision process. Invoking a deep deterministic policy gradient algorithm, our proposed DRL-based JRM (DRL-JRM) approach jointly performs the two subtasks, where the optimization objective and constraints of the subtasks are addressed by a new joint reward and internal reward mechanism. A multi-agent structure and a convolutional neural network are adopted to reduce the complexity of the PA subtask. We also tailor the neural network structure for the stability and convergence of DRL-JRM. Corroborated by extensive experiments, the proposed DRL-JRM scheme is superior to existing alternatives in terms of system throughput and resistance to interference, especially in the presence of many users and strong inter-cell interference. DRL-JRM can flexibly meet individual service requirements of users.
翻译:本文介绍了一种新颖而有效的深层强化学习(DRL)法,在实际的多载体非横向多重准入(MC-NOMA)系统中解决联合资源管理(JRM)的新颖而有效的深层强化学习(DRM)法,在该系统中,考虑硬件敏感性和不完善的连续干扰取消(SIC),我们首先提出JRM问题,以最大限度地扩大加权和系统输送量;然后,JRM问题被分解成两个迭接的子任务:分承载任务(SA,包括用户分组)和权力分配(PA)。每个子任务都是一个顺序决定过程。引用一种深刻的确定性政策梯度算法,我们提议的DRM(DRM)系统(DRR-JRM)方法共同执行两个子任务,通过新的联合奖励和内部奖励机制解决子任务的最佳目标和限制。多试结构和革命神经网络被采用,以减少PA子任务的复杂性。我们还为DRJR-JRM的稳定性网络结构的稳定性结构结构,通过广泛的实验,使现有强的用户的抗力干预制度(尤其是DRRMRRM的多重干涉制度)符合现有的高级用户的弹性制度。