In this paper, we exploit the capability of multi-agent deep reinforcement learning (MA-DRL) technique to generate a transmit power pool (PP) for Internet of things (IoT) networks with semi-grant-free non-orthogonal multiple access (SGF-NOMA). The PP is mapped with each resource block (RB) to achieve distributed transmit power control (DPC). We first formulate the resource (sub-channel and transmit power) selection problem as stochastic Markov game, and then solve it using two competitive MA-DRL algorithms, namely double deep Q network (DDQN) and Dueling DDQN. Each GF user as an agent tries to find out the optimal transmit power level and RB to form the desired PP. With the aid of dueling processes, the learning process can be enhanced by evaluating the valuable state without considering the effect of each action at each state. Therefore, DDQN is designed for communication scenarios with a small-size action-state space, while Dueling DDQN is for a large-size case. Our results show that the proposed MA-Dueling DDQN based SGF-NOMA with DPC outperforms the SGF-NOMA system with the fixed-power-control mechanism and networks with pure GF protocols with 17.5% and 22.2% gain in terms of the system throughput, respectively. Moreover, to decrease the training time, we eliminate invalid actions (high transmit power levels) to reduce the action space. We show that our proposed algorithm is computationally scalable to massive IoT networks. Finally, to control the interference and guarantee the quality-of-service requirements of grant-based users, we find the optimal number of GF users for each sub-channel.
翻译:在本文中,我们利用多试剂深层强化学习(MA-DRL)技术的能力,为具有半无赠与非垂直多存(SGF-NOMA)的互联网(IoT)网络生成一个传输电源库(PP),与每个资源块(RB)一起绘制PP图,以实现分布式传输电源控制(DPC)。我们首先将资源(分流和传输电源)选择问题作为随机式Markov游戏来开发,然后使用两个有竞争力的MA-DRL算法来解决它,即双深Q网络(DDQN)和DDQN。每个GF用户作为代理试图找到最佳传输电量和 RB 以形成理想的PPP。在配电程序的帮助下,通过评价有价值的状态,而不考虑每个州每次行动的效果。因此,DDQN设计DQ(DQN)用于小规模行动空间的通信场景,而DDQN则用于大型案件。我们的结果显示,每一个MA-DFS-MARELS-SMASMASMASMASMASMASMASMASMASDRDRDRRADFSODFSODFSDFSDFSODFSOLLLLLLLLLLLLLLLL 和22级SMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASDRDRDRDRMRDRDRMRDRDRDRDRDRDRDRDRMLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLDADODADODODODODADODODODODADADADADADADADADODADADADADADADADADADADODADADADADADADADADADADADA