Reconfigurable intelligent surfaces (RISs) can potentially combat jamming attacks by diffusing jamming signals. This paper jointly optimizes user selection, channel allocation, modulation-coding, and RIS configuration in a multiuser OFDMA system under a jamming attack. This problem is non-trivial and has never been addressed, because of its mixed-integer programming nature and difficulties in acquiring channel state information (CSI) involving the RIS and jammer. We propose a new deep reinforcement learning (DRL)-based approach, which learns only through changes in the received data rates of the users to reject the jamming signals and maximize the sum rate of the system. The key idea is that we decouple the discrete selection of users, channels, and modulation-coding from the continuous RIS configuration, hence facilitating the RIS configuration with the latest twin delayed deep deterministic policy gradient (TD3) model. Another important aspect is that we show a winner-takes-all strategy is almost surely optimal for selecting the users, channels, and modulation-coding, given a learned RIS configuration. Simulations show that the new approach converges fast to fulfill the benefit of the RIS, due to its substantially small state and action spaces. Without the need of the CSI, the approach is promising and offers practical value.
翻译:重新配置的智能表面(RIS) 可以通过移动干扰信号来打击干扰性袭击。 本文共同优化DMA系统多用户系统中的用户选择、 频道分配、 调制码和RIS配置。 这个问题是非三重性的, 从未解决, 因为它具有混合的编程性质, 并且难以获得涉及RIS 和干扰的频道状态信息( CSI) 。 我们提出一种新的基于深度强化学习( DRL) 的方法, 这种方法只能通过改变用户接收的数据率来学习, 以拒绝干扰信号, 并尽量扩大系统的总和率。 关键的想法是, 我们分解用户、 频道的离散选择, 以及从连续的RIS 配置中调制调调, 从而便利了RIS配置, 使用最新的双延迟的深度确定性政策梯度模型( TD3) 。 另一个重要方面是, 我们展示一个以赢者- 接受者- 全面战略, 几乎是最佳的选择用户、 频道和调制的用户数据率, 以最大的方式拒绝干扰信号信号和系统的总率。, 需要快速的CISISISISS 的快速组合。 的快速组合。 展示, 获得的快速定位, 的快速定位, 需要 以快速定位的快速定位的定位的快速定位。