通过政策级深强化学习进行基于RSS的UAV-BS-3D流动管理 (RSS-Based UAV-BS 3-D Mobility Management via Policy Gradient Deep Reinforcement Learning)

We address the mobility management of an autonomous UAV-mounted base station (UAV-BS) that provides communication services to a cluster of users on the ground while the geographical characteristics (e.g., location and boundary) of the cluster, the geographical locations of the users, and the characteristics of the radio environment are unknown. UAVBS solely exploits the received signal strengths (RSS) from the users and accordingly chooses its (continuous) 3-D speed to constructively navigate, i.e., improving the transmitted data rate. To compensate for the lack of a model, we adopt policy gradient deep reinforcement learning. As our approach does not rely on any particular information about the users as well as the radio environment, it is flexible and respects the privacy concerns. Our experiments indicate that despite the minimum available information the UAV-BS is able to distinguish between high-rise (often non-line-of-sight dominant) and sub-urban (mainly line-of-sight dominant) environments such that in the former (resp. latter) it tends to reduce (resp. increase) its height and stays close (resp. far) to the cluster. We further observe that the choice of the reward function affects the speed and the ability of the agent to adhere to the problem constraints without affecting the delivered data rate.

翻译：我们处理的是自动无人驾驶航空器(UAV-BS)基地站的流动管理问题,该基地站向地面用户群提供通信服务,而该基地站的地理特征(例如位置和边界)、用户的地理位置和无线电环境的特征尚不得而知,无人驾驶航空器只利用用户收到的信号强力(RSS),因此选择其(连续)三维速度进行建设性导航,即改进传送的数据率。为弥补缺乏模型的情况,我们采用政策梯度深度强化学习。由于我们的方法并不依赖关于用户和无线电环境的任何特定信息,因此它具有灵活性并尊重隐私问题。我们的实验表明,尽管现有信息最少,但无人驾驶航空器能够区分高层(通常不见直线主导)和郊区(主要直线主导)环境,如前者(后一类)往往减少(增加)其高度并保持近距离(暂停)接近(暂停)辐射环境,从而影响所交付的数据速度。我们观察的是,如何进一步减少(减少)其高度和保持(维持)其交付能力,从而影响所交付的数据速度。