优化无人驾驶航空、航空、航空、 IRS协助的IOT网络:基于备选办法的多试剂高等级深层强化学习方法 (Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach)

2021 年 12 月 20 日

Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach

翻译：优化无人驾驶航空、航空、航空、 IRS协助的IOT网络:基于备选办法的多试剂高等级深层强化学习方法

Jingren Xu,Xin Kang,Ronghaixiang Zhang,Ying-Chang Liang,Sumei Sun

This paper investigates a master unmanned aerial vehicle (MUAV)-powered Internet of Things (IoT) network, in which we propose using a rechargeable auxiliary UAV (AUAV) equipped with an intelligent reflecting surface (IRS) to enhance the communication signals from the MUAV and also leverage the MUAV as a recharging power source. Under the proposed model, we investigate the optimal collaboration strategy of these energy-limited UAVs to maximize the accumulated throughput of the IoT network. Depending on whether there is charging between the two UAVs, two optimization problems are formulated. To solve them, two multi-agent deep reinforcement learning (DRL) approaches are proposed, which are centralized training multi-agent deep deterministic policy gradient (CT-MADDPG) and multi-agent deep deterministic policy option critic (MADDPOC). It is shown that the CT-MADDPG can greatly reduce the requirement on the computing capability of the UAV hardware, and the proposed MADDPOC is able to support low-level multi-agent cooperative learning in the continuous action domains, which has great advantages over the existing option-based hierarchical DRL that only support single-agent learning and discrete actions.

翻译：本文对无人驾驶航空飞行器(MUAV)动力型物质互联网(IoT)网络进行了调查,我们提议使用配备智能反射表面(IRS)的可再充电辅助UAV(AUAV)网络,以加强MUAV的通信信号,并利用MUAV作为补给电源。根据拟议模式,我们调查这些能源有限的无人驾驶航空飞行器的最佳合作战略,以尽量扩大IOT网络的累积吞吐量。根据两个无人驾驶航空飞行器之间是否收费,我们制定了两个优化问题。为了解决这些问题,我们提议采用两种多剂深层强化学习(DRL)方法,即集中培训多剂深度确定性政策梯度(CT-MADDPG)和多剂深度确定性政策选项评论器(MADDPOC)。我们发现,CTMADPG能够大大降低对UAV硬件计算能力的要求,而拟议的MADDP能够支持连续行动领域的低层次多剂合作学习。为了解决这些问题,这两种方法对基于现有选择的单级级试剂具有极大的优势。