The paper studies the problem of designing the Intelligent Reflecting Surface (IRS) phase shifters for Multiple Input Single Output (MISO) communication systems in spatiotemporally correlated channel environments, where the destination can move within a confined area. The objective is to maximize the expected sum of SNRs at the receiver over infinite time horizons. The problem formulation gives rise to a Markov Decision Process (MDP). We propose a deep actor-critic algorithm that accounts for channel correlations and destination motion by constructing the state representation to include the current position of the receiver and the phase shift values and receiver positions that correspond to a window of previous time steps. The channel variability induces high frequency components on the spectrum of the underlying value function. We propose the preprocessing of the critic's input with a Fourier kernel which enables stable value learning. Finally, we investigate the use of the destination SNR as a component of the designed MDP state, which is common practice in previous work. We provide empirical evidence that, when the channels are spatiotemporally correlated, the inclusion of the SNR in the state representation interacts with function approximation in ways that inhibit convergence.
翻译:本文研究了设计多输入单一输出(MISO)同步通道环境中多输入相联信道环境的智能反射层(IRS)相向转换器的问题,在这种环境中,目的地可以在一个封闭的通道环境中移动。目标是在无限的时间范围内最大限度地实现接收器中SNR的预期总和。问题形成产生了Markov 决策程序(MDP ) 。我们建议了一种深层次的行为者-偏移算法,它考虑到频道的关联性和目的地运动,方法是建立国家代表制,以包括接收器的当前位置和与先前时间步骤窗口相对应的相向值和接收器位置。频道变异性在基本价值功能的频谱上诱发高频组件。我们建议预先处理评论器的投入,用四面内核进行稳定的价值学习。最后,我们调查将目的地SNR用作设计MDP状态的一个组成部分的情况,这是以往工作中的常见做法。我们提供了经验性证据,证明当接收器与前一个时,将SNRR纳入州代表制的功能以抑制聚合的方式与功能相互作用。