Reinforcement Learning (RL) approaches are lately deployed for orchestrating wireless communications empowered by Reconfigurable Intelligent Surfaces (RISs), leveraging their online optimization capabilities. Most commonly, in RL-based formulations for realistic RISs with low resolution phase-tunable elements, each configuration is modeled as a distinct reflection action, resulting to inefficient exploration due to the exponential nature of the search space. In this paper, we consider RISs with 1-bit phase resolution elements, and model the action of each of them as a binary vector including the feasible reflection coefficients. We then introduce two variations of the well-established Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) agents, aiming for effective exploration of the binary action spaces. For the case of DQN, we make use of an efficient approximation of the Q-function, whereas a discretization post-processing step is applied to the output of DDPG. Our simulation results showcase that the proposed techniques greatly outperform the baseline in terms of the rate maximization objective, when large-scale RISs are considered. In addition, when dealing with moderate scale RIS sizes, where the conventional DQN based on configuration-based action spaces is feasible, the performance of the latter technique is similar to the proposed learning approach.
翻译:由于搜索空间的指数性,每个配置都以不同的反射行动为模型,导致搜索空间的快速性能探索效率低下。在本文件中,我们认为带有1位分辨率分辨率元素的RIS,并将每个系统的行动模型作为二进制矢量,包括可行的反射系数。然后,我们采用两种不同的变式,即完善的深Q网络(DQN)和深度确定性政策梯度(DDPG)代理器,目的是有效探索二进制行动空间。就DQN而言,我们使用高效的Q功能近似,而对DDPG的输出则采用离散化后处理步骤。我们的模拟结果显示,在考虑大规模REG的常规规模时,拟议的技术大大超越了最高比率目标的基线。此外,在考虑大规模REGS的常规规模时,在考虑以中等程度的学习空间为基础,还涉及以中等程度的学习空间。