Unmanned aerial vehicles (UAVs) have been widely used in military warfare. In this paper, we formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments. To overcome the limitations of the prioritized experience replay (PER) algorithm and improve performance, the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities, assigns the true priorities and applies a temporary experience pool to make available experiences of higher quality for learning. A first-in-useless-out (FIUO) experience pool is also introduced to ensure the higher use value of the stored experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm of sampling experiences from simple to difficult is designed for training UAVs. By training in a complex unknown environment constructed based on the parameters of a real UAV, the proposed ACER improves the convergence speed by 24.66\% and the convergence result by 5.59\% compared to the state-of-the-art twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the ACER agent.
翻译:无人驾驶航空飞行器(无人驾驶飞行器)被广泛用于军事战争。在本文件中,我们将自主运动控制(AMC)问题作为Markov决定程序(MDP)来制定自主运动控制(AMC)问题,并提出先进的强化学习(DRL)方法,使无人驾驶飞行器能够在大规模动态三维(3D)环境中执行复杂任务。为了克服优先经验重放(PER)算法的局限性并改进性能,拟议的非同步课程重播(ACER)利用多轨迹来同步更新优先事项,指定真正的优先事项,并运用临时经验库来提供更高质量的学习经验。还引入了 " 首次使用不使用(FIUO) " 经验库,以确保储存经验的更高使用价值。此外,与课程学习(CLL)相结合,设计了一个更合理的简单到困难的抽样经验培训模式,用于培训无人驾驶航空飞行器(ACER),拟议ACER在基于实际航空飞行器参数的复杂环境中进行培训,提高合并速度,24.66*和临时积累更强的学习经验库,并用5.59* 与不同试验机级的精度测试环境相比,显示稳性标准的精度比。