Unmanned aerial vehicles (UAVs) are expected to be an integral part of wireless networks, and determining collision-free trajectories for multiple UAVs while satisfying requirements of connectivity with ground base stations (GBSs) is a challenging task. In this paper, we first reformulate the multi-UAV trajectory optimization problem with collision avoidance and wireless connectivity constraints as a sequential decision making problem in the discrete time domain. We, then, propose a decentralized deep reinforcement learning approach to solve the problem. More specifically, a value network is developed to encode the expected time to destination given the agent's joint state (including the agent's information, the nearby agents' observable information, and the locations of the nearby GBSs). A signal-to-interference-plus-noise ratio (SINR)-prediction neural network is also designed, using accumulated SINR measurements obtained when interacting with the cellular network, to map the GBSs' locations into the SINR levels in order to predict the UAV's SINR. Numerical results show that with the value network and SINR-prediction network, real-time navigation for multi-UAVs can be efficiently performed in various environments with high success rate.
翻译:无人驾驶航空飞行器(无人驾驶飞行器)预计将成为无线网络的一个组成部分,确定多架无人驾驶航空器的无碰撞轨迹,同时满足与地面基地站连接的要求是一项艰巨的任务。在本文件中,我们首先将多架无人驾驶航空器轨道优化问题与避免碰撞和无线连接限制重塑为离散时间范围内的连续决策问题。然后,我们提出一种分散式的深层强化学习方法来解决问题。更具体地说,开发了一个价值网络,以根据该代理人的共同状态(包括该代理人的信息、附近代理人的可观测信息以及附近GBS的所在地),将预期目的地的时间编码起来。还设计了一个信号到干涉加音频比率(SINR)-定位神经网络,利用在与手机网络互动时获得的累计SINR测量结果,将GBS的定位点绘制到SINR的级别,以便预测UAV的SINR。 数字结果显示,随着价值网络和SINR的定位网络的运行,在各种高成功环境中,可有效运行多式导航率。