Non-orthogonal multiple access (NOMA) exploits the potential of the power domain to enhance the connectivity for the Internet of Things (IoT). Due to time-varying communication channels, dynamic user clustering is a promising method to increase the throughput of NOMA-IoT networks. This paper develops an intelligent resource allocation scheme for uplink NOMA-IoT communications. To maximise the average performance of sum rates, this work designs an efficient optimization approach based on two reinforcement learning algorithms, namely deep reinforcement learning (DRL) and SARSA-learning. For light traffic, SARSA-learning is used to explore the safest resource allocation policy with low cost. For heavy traffic, DRL is used to handle traffic-introduced huge variables. With the aid of the considered approach, this work addresses two main problems of fair resource allocation in NOMA techniques: 1) allocating users dynamically and 2) balancing resource blocks and network traffic. We analytically demonstrate that the rate of convergence is inversely proportional to network sizes. Numerical results show that: 1) Compared with the optimal benchmark scheme, the proposed DRL and SARSA-learning algorithms have lower complexity with acceptable accuracy and 2) NOMA-enabled IoT networks outperform the conventional orthogonal multiple access based IoT networks in terms of system throughput.
翻译:非横向多重访问(NOMA) 利用电力领域的潜力加强物联网互联网的连接(IoT) 。 由于时间变化的通信渠道,动态用户集群是增加NOMA-IoT网络输送量的一个很有希望的方法。 本文为将NOMA- IoT通信连接起来开发了一个智能资源分配计划。 为了最大限度地提高平均总率绩效, 这项工作根据两种强化学习算法, 即深强化学习( DRL) 和SASA- 学习, 设计一种高效优化方法。 对于轻型交通, SA- 学习被用来以低成本探索最安全的资源配置政策。 对于重型交通, DRL 用于处理流量驱动的巨大变量。 在经过考虑的方法的帮助下, 这项工作解决了NOMA 技术公平资源分配的两个主要问题:1 动态分配用户, 2 平衡资源区块和网络交通。 我们分析表明, 趋同率与网络规模反比。 数字显示:(1) 与最佳基准计划、提议的IMDR- IMA 系统、基于IMA 的可接受性格式的IMA- Reximal 以及基于IMA IMA IMA IMA 系统的多重访问条件。