Unmanned aerial base stations (UABSs) can be deployed in vehicular wireless networks to support applications such as extended sensing via vehicle-to-everything (V2X) services. A key problem in such systems is designing algorithms that can efficiently optimize the trajectory of the UABS in order to maximize coverage. In existing solutions, such optimization is carried out from scratch for any new traffic configuration, often by means of conventional reinforcement learning (RL). In this paper, we propose the use of continual meta-RL as a means to transfer information from previously experienced traffic configurations to new conditions, with the goal of reducing the time needed to optimize the UABS's policy. Adopting the Continual Meta Policy Search (CoMPS) strategy, we demonstrate significant efficiency gains as compared to conventional RL, as well as to naive transfer learning methods.
翻译:无人驾驶的空中基地站(UABS)可部署在车辆无线网络中,以支持通过车辆到一切服务(V2X)服务进行扩展遥感等应用;这些系统的一个关键问题是设计各种算法,以便有效地优化UABS的轨迹,从而最大限度地扩大覆盖范围;在现有的解决办法中,对任何新的交通配置来说,这种优化是从零开始的,通常是通过常规强化学习(RL)来实现的;在本文件中,我们提议使用连续的元-RL作为手段,将信息从以往有经验的交通配置转移到新的条件,目的是缩短优化UABS政策所需的时间;采用连续元政策搜索(COMPS)战略,与常规RL相比,我们展示了显著的效率收益,并展示了天真的传输学习方法。