以无人驾驶航空器为基础的互联网图案数据收集的轨迹设计:深强化学习方法 (Trajectory Design for UAV-Based Internet-of-Things Data Collection: A Deep Reinforcement Learning Approach)

from arxiv, Accepted by IEEE Internet of Things Journal. The codes and some other materials about this work may be available at https://gaozhen16.github.io/

In this paper, we investigate an unmanned aerial vehicle (UAV)-assisted Internet-of-Things (IoT) system in a sophisticated three-dimensional (3D) environment, where the UAV's trajectory is optimized to efficiently collect data from multiple IoT ground nodes. Unlike existing approaches focusing only on a simplified two-dimensional scenario and the availability of perfect channel state information (CSI), this paper considers a practical 3D urban environment with imperfect CSI, where the UAV's trajectory is designed to minimize data collection completion time subject to practical throughput and flight movement constraints. Specifically, inspired from the state-of-the-art deep reinforcement learning approaches, we leverage the twin-delayed deep deterministic policy gradient (TD3) to design the UAV's trajectory and present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm. In particular, we set an additional information, i.e., the merged pheromone, to represent the state information of UAV and environment as a reference of reward which facilitates the algorithm design. By taking the service statuses of IoT nodes, the UAV's position, and the merged pheromone as input, the proposed algorithm can continuously and adaptively learn how to adjust the UAV's movement strategy. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can achieve a near-optimal navigation strategy. Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional non-learning based baseline methods.

翻译：在本文中,我们在一个精密的三维(3D)环境中调查无人驾驶航空飞行器(UAV)协助的互联网操作系统(IoT),在这个系统中,UAV的轨迹得到优化,以便从多个IOT地面节点中有效地收集数据。与目前仅侧重于简化的二维假设和提供完美的频道状态信息(CSI)的现有方法不同,本文认为,这是一个实用的三维城市环境,其中不完善的CSI是三维城市环境,UAV的轨迹设计旨在尽量减少数据收集的完成时间,但须视实际的通过量和飞行移动限制而定。具体地说,根据最先进的强化学习方法,我们利用双延迟的深度确定性政策梯度梯度(TD 3)来设计UAV的轨迹,提出基于TD3的轨迹设计,以便完成时间最小化(TD3-TDCTM)算法。特别是,我们设置了一个额外的信息,即合并的Feloomone,以代表UAVAV和环境的状态信息,作为奖励的参照点,便利了算法设计。通过在IOT的不相近端的升级的轨迹上,我们的拟议的轨算法战略,可以使UAVAVLLL的轨迹定位得到不断的轨迹学的变化的定位的定位的定位,从而的变化的定位,可以将三个的变换进的轨迹定位,从而的变到的变换到的基战略,可以使AVAVUAVAV的轨迹的轨迹。