The combination of energy harvesting (EH), cognitive radio (CR), and non-orthogonal multiple access (NOMA) is a promising solution to improve energy efficiency and spectral efficiency of the upcoming beyond fifth generation network (B5G), especially for support the wireless sensor communications in Internet of things (IoT) system. However, how to realize intelligent frequency, time, and energy resource allocation to support better performances is an important problem to be solved. In this paper, we study joint spectrum, energy, and time resource management for the EH-CR-NOMA IoT systems. Our goal is to minimize the number of data packets losses for all secondary sensing users (SSU), while satisfying the constraints on the maximum charging battery capacity, maximum transmitting power, maximum buffer capacity, and minimum data rate of primary users (PU) and SSUs. Due to the non-convexity of this optimization problem and the stochastic nature of the wireless environment, we propose a distributed multidimensional resource management algorithm based on deep reinforcement learning (DRL). Considering the continuity of the resources to be managed, the deep deterministic policy gradient (DDPG) algorithm is adopted, based on which each agent (SSU) can manage its own multidimensional resources without collaboration. In addition, a simplified but practical action adjuster (AA) is introduced for improving the training efficiency and battery performance protection. The provided results show that the convergence speed of the proposed algorithm is about 4 times faster than that of DDPG, and the average number of packet losses (ANPL) is about 8 times lower than that of the greedy algorithm.
翻译:然而,如何实现智能频率、时间和能源资源分配以支持更好的性能是一个有待解决的重要问题。在本文件中,我们研究了EH-CR-NOMA IOT系统的联合频谱、能量和时间资源管理。我们的目标是最大限度地减少所有二级感测用户(SSU)的数据包损失数量,同时满足对最大充电能力、最大传输能力、最大缓冲能力以及初级用户(PU)和SSU最低数据率的限制。然而,如何实现智能频率、时间和能源资源分配以支持更好的性能是一个有待解决的重要问题。由于这一优化问题不协调,以及无线环境的复杂性质,我们建议根据深度强化学习(DRL),采用分布式的多维资源管理算法。考虑到要管理的资源的连续性,深度确定性政策组合损失的数量,同时满足对最大充电容量能力、最大缓冲能力以及初级用户(PU)和SU的最低数据速率(DDPGA)进行最大递增成本评估,这是基于对自身成本(DPL)平均培训的升级(DPG)成本的升级(DGAA)的升级,这是对其自身的升级工具的升级的升级的升级,这是基于对成本的升级的升级的升级和升级的升级的升级的动作的升级的升级。