Reconfigurable intelligent surfaces (RISs) mounted on unmanned aerial vehicles (UAVs) can reshape wireless propagation on-demand. However, their performance is sensitive to UAV jitter and cascaded channel uncertainty. This paper investigates a downlink multiple-input single-output UAV-mounted RIS system in which a ground multiple-antenna base station (BS) serves multiple single-antenna users under practical impairments. Our goal is to maximize the expected throughput under stochastic three-dimensional UAV jitter and imperfect cascaded channel state information (CSI) based only on the available channel estimates. This leads to a stochastic nonconvex optimization problem subject to a BS transmit power constraint and strict unit-modulus constraints on all RIS elements. To address this problem, we design a model-free deep reinforcement learning (DRL) framework with a contextual bandit formulation. A differentiable feasibility layer is utilized to map continuous actions to feasible solutions, while the reward is a Monte Carlo estimate of the expected throughput. We instantiate this framework with constrained variants of deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3) that do not use target networks. Simulations show that the proposed algorithms yield higher throughput than conventional alternating optimization-based weighted minimum mean-square error (AO-WMMSE) baselines under severe jitter and low CSI quality. Across different scenarios, the proposed methods achieve performance that is either comparable to or slightly below the AO-WMMSE benchmark, based on sample average approximation (SAA) with a relative gap ranging from 0-12%. Moreover, the proposed DRL controllers achieve online inference times of 0.6 ms per decision versus roughly 370-550 ms for AO-WMMSE solvers.


翻译:无人机载可重构智能表面能够按需重塑无线传播环境。然而,其性能对无人机抖动与级联信道不确定性极为敏感。本文研究一种下行多输入单输出无人机载RIS系统,其中地面多天线基站(BS)在实际损伤条件下服务多个单天线用户。我们的目标是在随机三维无人机抖动与不完美级联信道状态信息(CSI)条件下,仅基于可用信道估计值最大化期望吞吐量。这导致了一个受限于BS发射功率约束及所有RIS单元严格单位模约束的随机非凸优化问题。为解决该问题,我们设计了一种基于情境赌博机建模的无模型深度强化学习框架。该框架采用可微可行性层将连续动作映射至可行解,而奖励函数则为期望吞吐量的蒙特卡洛估计。我们通过不使用目标网络的深度确定性策略梯度与双延迟深度确定性策略梯度的约束变体来实例化该框架。仿真表明,在严重抖动与低CSI质量条件下,所提算法相比传统基于交替优化的加权最小均方误差基准方法能获得更高吞吐量。在不同场景下,所提方法基于样本平均逼近的性能与AO-WMMSE基准相比具有可比性或略低,相对差距范围为0-12%。此外,所提DRL控制器可实现每决策0.6毫秒的在线推理时间,而AO-WMMSE求解器约需370-550毫秒。

0
下载
关闭预览

相关内容

CVPR 2019:精确目标检测的不确定边界框回归
AI科技评论
13+阅读 · 2019年9月16日
国家自然科学基金
0+阅读 · 2015年12月31日
国家自然科学基金
46+阅读 · 2015年12月31日
国家自然科学基金
1+阅读 · 2015年12月31日
VIP会员
相关基金
国家自然科学基金
0+阅读 · 2015年12月31日
国家自然科学基金
46+阅读 · 2015年12月31日
国家自然科学基金
1+阅读 · 2015年12月31日
Top
微信扫码咨询专知VIP会员