In this work, a two-stage deep reinforcement learning (DRL) approach is presented for a full-duplex (FD) transmission scenario that does not depend on the channel state information (CSI) knowledge to predict the phase-shifts of reconfigurable intelligent surface (RIS), beamformers at the base station (BS), and the transmit powers of BS and uplink users in order to maximize the weighted sum rate of uplink and downlink users. As the self-interference (SI) cancellation and beamformer design are coupled problems, the first stage uses a least squares method to partially cancel self-interference (SI) and initiate learning, while the second stage uses DRL to make predictions and achieve performance close to methods with perfect CSI knowledge. Further, to reduce the signaling from BS to the RISs, a DRL framework is proposed that predicts quantized RIS phase-shifts and beamformers using $32$ times fewer bits than the continuous version. The quantized methods have reduced action space and therefore faster convergence; with sufficient training, the UL and DL rates for the quantized phase method are $8.14\%$ and $2.45\%$ better than the continuous phase method respectively. The RIS elements can be grouped to have similar phase-shifts to further reduce signaling, at the cost of reduced performance.
翻译:在这项工作中,为完全重复式(DRL)传输设想提出了两阶段深度强化学习(DRL)方法,该方法不取决于频道状态信息(CSI)知识,以预测可重新配置的智能表面(RIS)的阶段轮班、基地站的信号仪(BS),以及BS和上链接用户的传输能力,以最大限度地实现上链和下链用户的加权和加权总和率。随着自干预(SI)取消和波段设计是同时存在的问题,第一阶段使用最小方(FD)传输方法,以部分取消自干预(SI)和开始学习,而第二阶段则使用DRL(CSI)知识,以预测并实现接近CSI知识完善的方法的绩效。此外,为了减少BS到RIS的信号,建议DRL框架预测自上链和下链路用户的加权总和加权比连续版本少32倍。量化方法减少了行动空间,因此加快了行动趋同速度;经过充分培训后,UL和DL(I)费率在连续阶段进一步降低成本后,连续阶段的RIS-14级方法进一步降低成本。