通过深强化学习传输无赠款的NOMA-IoT网络的电池设计 (Transmit Power Pool Design for Grant-Free NOMA-IoT Networks via Deep Reinforcement Learning)

Grant-free non-orthogonal multiple access (GF-NOMA) is a potential multiple access framework for short-packet internet-of-things (IoT) networks to enhance connectivity. However, the resource allocation problem in GF-NOMA is challenging due to the absence of closed-loop power control. We design a prototype of transmit power pool (PP) to provide open-loop power control. IoT users acquire their transmit power in advance from this prototype PP solely according to their communication distances. Firstly, a multi-agent deep Q-network (DQN) aided GF-NOMA algorithm is proposed to determine the optimal transmit power levels for the prototype PP. More specifically, each IoT user acts as an agent and learns a policy by interacting with the wireless environment that guides them to select optimal actions. Secondly, to prevent the Q-learning model overestimation problem, double DQN based GF-NOMA algorithm is proposed. Numerical results confirm that the double DQN based algorithm finds out the optimal transmit power levels that form the PP. Comparing with the conventional online learning approach, the proposed algorithm with the prototype PP converges faster under changing environments due to limiting the action space based on previous learning. The considered GF-NOMA system outperforms the networks with fixed transmission power, namely all the users have the same transmit power and the traditional GF with orthogonal multiple access techniques, in terms of throughput.

翻译：免费无赠与非口头多重访问(GF-NOMA)是短包装互联网网络(IoT)网络的潜在多重访问框架,可加强连通性。然而,GF-NOMA的资源分配问题由于缺乏闭路电力控制,因此具有挑战性。我们设计了一个传输电池原型(PP),以提供开放环状电源控制。IoT用户仅根据其通信距离从这个原型PPP获得其传输能力。首先,提议了一个多剂深度QN网络(DQN)辅助的GF-NOMA算法,以确定原型PP的最佳传输能力水平。更具体地说,每个IoT用户都作为代理,通过与引导他们选择最佳行动的无线环境互动,学习一项政策。第二,为防止Q学习模式过高问题,提议了以GFN为基础的G-NOMA算法翻倍。基于双倍的DQN算法发现最佳传输能力水平是PPP的最佳传输能力水平。在传统的访问网络中,将传统的访问能力与最新的空间传输方法同步进行对比。根据以往的GFPPD-MA的系统,将使用最新的空间变换式算法,将所有的原版算法与最新的系统升级的系统进行。