The high demand for data rate in the next generation of wireless communication could be ensured by Non-Orthogonal Multiple Access (NOMA) approach in the millimetre-wave (mmW) frequency band. Joint power allocation and beamforming of mmW-NOMA systems is mandatory which could be met by optimization approaches. To this end, we have exploited Deep Reinforcement Learning (DRL) approach due to policy generation leading to an optimized sum-rate of users. Actor-critic phenomena are utilized to measure the immediate reward and provide the new action to maximize the overall Q-value of the network. The immediate reward has been defined based on the summation of the rate of two users regarding the minimum guaranteed rate for each user and the sum of consumed power as the constraints. The simulation results represent the superiority of the proposed approach rather than the Time-Division Multiple Access (TDMA) and another NOMA optimized strategy in terms of sum-rate of users.
翻译:通过毫米波频率带的非正统多重存取(NOMA)办法,可以确保下一代无线通信对数据的高需求率。联合配电和对毫米W-NOMA系统进行波束成型是强制性的,可以通过优化办法予以满足。为此目的,我们利用了深度加固学习(DRL)办法,因为这样做可以形成政策,从而实现最佳的用户总和。行为者-批评现象被用来衡量眼前的奖励,并提供新的行动,以尽量扩大网络的总体Q价值。即时奖励是根据两个用户对每个用户的最低保证费率和消耗的电力总和的加和确定。模拟结果代表了拟议办法的优越性,而不是时间-部门多重存取(TDMA)和另一个NOMA在用户总和率方面的优化战略。