No-pain No-Gain: DRL 在经过能源限制的CR-NOMA网络中协助优化 (No-Pain No-Gain: DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks)

This paper applies machine learning to optimize the transmission policy of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) networks, where time-division multiple access (TDMA) is used to serve multiple primary users and an energy-constrained secondary user is admitted to the primary users' time slots via NOMA. During each time slot, the secondary user performs the two tasks: data transmission and energy harvesting based on the signals received from the primary users. The goal of the paper is to maximize the secondary user's long-term throughput, by optimizing its transmit power and the time-sharing coefficient for its two tasks. The long-term throughput maximization problem is challenging due to the need for making decisions that yield long-term gains but might result in short-term losses. For example, when in a given time slot, a primary user with large channel gains transmits, intuition suggests that the secondary user should not carry out data transmission due to the strong interference from the primary user but perform energy harvesting only, which results in zero data rate for this time slot but yields potential long-term benefits. In this paper, a deep reinforcement learning (DRL) approach is applied to emulate this intuition, where the deep deterministic policy gradient (DDPG) algorithm is employed together with convex optimization. Our simulation results demonstrate that the proposed DRL assisted NOMA transmission scheme can yield significant performance gains over two benchmark schemes.

翻译：本文运用机器学习优化认知电台激励的非横向多重访问(CR-NOMA)网络的传输政策,即利用时间差异多重访问(TDMA)为多个初级用户服务,而能源限制的第二用户则通过NOMA被接纳到主要用户的时间档。在每一个时间档中,第二用户都执行两项任务:根据从初级用户收到的信号进行数据传输和能源采集。文件的目标是通过优化其传输能力和时间共享系数,最大限度地实现第二用户的长期吞吐量,从而优化其两项任务。长期吞吐最大化问题具有挑战性,因为需要做出能够产生长期收益的决定,但可能导致短期损失。例如,在给定的时间档中,一个拥有大量渠道收益的主要用户执行两项任务:数据传输和能源采集。在从初级用户得到的信号的基础上,第二用户不应进行数据传输,而只能进行能源采集,这导致这个时间档的数据率为零,但会产生潜在的长期效益。在本文中,一项深度加固的增值最大化问题,因为需要做出能够产生长期收益的决定,但可能导致短期损失。例如,当在一个特定时间档中,一个拥有大量ADRBL的升级计划后,我们使用的GRAFL系统模拟计划可以用来模拟计算结果。