This paper studies deep reinforcement learning (DRL) for the task scheduling problem of multiple unmanned aerial vehicles (UAVs). Current approaches generally use exact and heuristic algorithms to solve the problem, while the computation time rapidly increases as the task scale grows and heuristic rules need manual design. As a self-learning method, DRL can obtain a high-quality solution quickly without hand-engineered rules. However, the huge decision space makes the training of DRL models becomes unstable in situations with large-scale tasks. In this work, to address the large-scale problem, we develop a divide and conquer-based framework (DCF) to decouple the original problem into a task allocation and a UAV route planning subproblems, which are solved in the upper and lower layers, respectively. Based on DCF, a double-layer deep reinforcement learning approach (DL-DRL) is proposed, where an upper-layer DRL model is designed to allocate tasks to appropriate UAVs and a lower-layer DRL model [i.e., the widely used attention model (AM)] is applied to generate viable UAV routes. Since the upper-layer model determines the input data distribution of the lower-layer model, and its reward is calculated via the lower-layer model during training, we develop an interactive training strategy (ITS), where the whole training process consists of pre-training, intensive training, and alternate training processes. Experimental results show that our DL-DRL outperforms mainstream learning-based and most traditional methods, and is competitive with the state-of-the-art heuristic method [i.e., OR-Tools], especially on large-scale problems. The great generalizability of DL-DRL is also verified by testing the model learned for a problem size to larger ones. Furthermore, an ablation study demonstrates that our ITS can reach a compromise between the model performance and training duration.
翻译:本文为多个无人驾驶飞行器的任务时间安排问题研究深度强化学习( DRL ) 。 目前的方法通常使用精确和超速算法来解决问题,而随着任务规模的扩大和累进规则需要手工设计,计算时间会迅速增加。 作为自学方法,DRL可以在没有手工设计规则的情况下迅速获得高质量的解决方案。然而,巨大的决策空间使得DRL模型在有大规模任务的情况下变得不稳定。在这项工作中,为了解决大规模问题,我们开发了一个分流和征服框架(DCF)来将最初的问题分解成一个任务分配,而UAV路线规划子问题则需要手工设计。基于DCF,提出了一种双层深度强化学习方法(DL-DRL ) 。 高层次的DRL模型可以把任务分配到适当的UAVS和低层次的DRL模型[i, 广泛使用的注意模型(AM)] 用于生成可行的超级任务分配和互动路径。 在DAVL的高级培训模式中,一个高层次的培训模式是用来计算,一个高层次的培训模式,一个测试和低级培训过程的模型。