The connectivity-aware path design is crucial in the effective deployment of autonomous Unmanned Aerial Vehicles (UAVs). Recently, Reinforcement Learning (RL) algorithms have become the popular approach to solving this type of complex problem, but RL algorithms suffer slow convergence. In this paper, we propose a Transfer Learning (TL) approach, where we use a teacher policy previously trained in an old domain to boost the path learning of the agent in the new domain. As the exploration processes and the training continue, the agent refines the path design in the new domain based on the subsequent interactions with the environment. We evaluate our approach considering an old domain at sub-6 GHz and a new domain at millimeter Wave (mmWave). The teacher path policy, previously trained at sub-6 GHz path, is the solution to a connectivity-aware path problem that we formulate as a constrained Markov Decision Process (CMDP). We employ a Lyapunov-based model-free Deep Q-Network (DQN) to solve the path design at sub-6 GHz that guarantees connectivity constraint satisfaction. We empirically demonstrate the effectiveness of our approach for different urban environment scenarios. The results demonstrate that our proposed approach is capable of reducing the training time considerably at mmWave.
翻译:在有效部署自主无人驾驶航空飞行器(UAVs)方面,连通性通识路径设计至关重要。最近,强化学习算法已成为解决这类复杂问题的流行方法,但RL算法进展缓慢。在本文中,我们提议了转移学习(TL)方法,我们利用以前在旧领域受过训练的教师政策,在新领域推动代理商的路径学习。随着勘探进程和培训的继续,代理商根据随后与环境的相互作用改进新域的路径设计。我们评估了我们考虑在子6GHz的旧域和毫米波(mmWave)的新域的方法。以前在子6GHM路径上受过训练的教师路径政策是解决连接-觉路径问题的办法,我们作为限制的Markov决定程序(CMDP)制定了这种办法。我们采用基于Lyapunov的无型深Q-Network(DQQQN)来解决在子6GHZ的路径设计方法,保证连通性满意度。我们的经验性地展示了我们在不同的城市环境中采用的方法的有效性。