We study the joint active/passive beamforming and channel blocklength (CBL) allocation in a non-ideal reconfigurable intelligent surface (RIS)-aided ultra-reliable and low-latency communication (URLLC) system. The considered scenario is a finite blocklength (FBL) regime and the problem is solved by leveraging a novel deep reinforcement learning (DRL) algorithm named twin-delayed deep deterministic policy gradient (TD3). First, assuming an industrial automation system with multiple actuators, the signal-to-interference-plus-noise ratio and achievable rate in the FBL regime are identified for each actuator in terms of the phase shift configuration matrix at the RIS. Next, the joint active/passive beamforming and CBL optimization problem is formulated where the objective is to maximize the total achievable FBL rate in all actuators, subject to non-linear amplitude response at the RIS elements, BS transmit power budget, and total available CBL. Since the amplitude response equality constraint is highly non-convex and non-linear, we resort to employing an actor-critic policy gradient DRL algorithm based on TD3. The considered method relies on interacting RIS with the industrial automation environment by taking actions which are the phase shifts at the RIS elements, CBL variables, and BS beamforming to maximize the expected observed reward, i.e., the total FBL rate. We assess the performance loss of the system when the RIS is non-ideal, i.e., with non-linear amplitude response, and compare it with ideal RIS without impairments. The numerical results show that optimizing the RIS phase shifts, BS beamforming, and CBL variables via the proposed TD3 method is highly beneficial to improving the network total FBL rate as the proposed method with deterministic policy outperforms conventional methods.
翻译:我们在一个非理想的可调整智能表面(RIS)中研究联合主动/被动波束成形和频道区块长度(CBL)分配。我们考虑的情景是一个有限的区块状(FBL)制度,这个问题通过利用名为双延迟深度确定性政策梯度(TD3)的新型深强化学习(DRL)算法来解决。首先,假设一个具有多个驱动器的工业自动化系统,确定FBL制度内信号到干涉-超声波比和可实现的比率。对于每个活动地平面(RIS)的信号到干涉-超声波比(CBL)和可实现的比率,确定每个活动地平面的超感应器,确定每个活动/被动波形(URCL)和低时,确定每个活动地平面调整矩阵(UR)配置矩阵。下一步,联合活动/被动波形组合和CBL优化(CBDRIF)的优化方法,在最大动作周期中显示拟议FBRL的可实现率率率。BS传输变数和总和总CBDRADL反应时,我们采用SDL的SDAFI政策动作法。我们采用一个不考虑的周期周期周期周期的动作方法,在DRIFIFDLFLFDFDFDFDFDFDFDFDFFFFFFFF的周期的计算法。