The emergent technology of Reconfigurable Intelligent Surfaces (RISs) has the potential to transform wireless environments into controllable systems, through programmable propagation of information-bearing signals. Techniques stemming from the field of Deep Reinforcement Learning (DRL) have recently gained popularity in maximizing the sum-rate performance in multi-user communication systems empowered by RISs. Such approaches are commonly based on Markov Decision Processes (MDPs). In this paper, we instead investigate the sum-rate design problem under the scope of the Multi-Armed Bandits (MAB) setting, which is a relaxation of the MDP framework. Nevertheless, in many cases, the MAB formulation is more appropriate to the channel and system models under the assumptions typically made in the RIS literature. To this end, we propose a simpler DRL approach for orchestrating multiple metasurfaces in RIS-empowered multi-user Multiple-Input Single-Output (MISO) systems, which we numerically show to perform equally well with a state-of-the-art MDP-based approach, while being less demanding computationally.
翻译:重新配置的智能表面(RIS)的新兴技术有可能通过可编程传播含有信息的信号,将无线环境转化为可控系统,来自深强化学习领域的技术最近在使RIS授权的多用户通信系统中的超速性能最大化方面越来越受欢迎,这类方法通常以Markov决策程序为基础。在本文中,我们相反地调查了多压强匪徒(MAB)设置范围内的总速率设计问题,这是MDP框架的松动。然而,在许多情况下,根据RIS文献中通常作出的假设,MAB的配制更适合频道和系统模型。为此,我们提议了一种更简单的DRL方法,用于在RIS-emow的多用户多用户多投影单一产出系统中操纵多个元表层。 我们从数字上显示,在计算上与以状态为基础的MDP(MIO)方法表现得同样好,同时不那么严格。