The Intelligent decision of the unmanned combat aerial vehicle (UCAV) has long been a challenging problem. The conventional search method can hardly satisfy the real-time demand during high dynamics air combat scenarios. The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks. However, the sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task, which raises great difficulties for the RL air combat application. In this paper, we propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward. The convergence and the feasibility of this method are also proved in this paper. To confirm our method feasibly, we construct a detailed 3D air combat simulation environment for the RL-based methods training firstly, and we implement our method in both the attack horizontal flight UCAV task and the self-play confrontation task. Experimental results show that our method performs better than the methods only utilizing the sparse reward or the artificial prior experience reward. The agent trained by our method can reach more than 98.3% win rate in the attack horizontal flight UCAV task and average 67.4% win rate when confronted with the agents trained by the other two methods.
翻译:无人驾驶作战飞行器(UCAV)的明智决定长期以来一直是一个具有挑战性的问题。常规搜索方法很难满足高动态空中战斗情景下的实时需求。强化学习(RL)方法可以通过神经网络大大缩短决策时间。然而,微弱的奖励问题限制了其趋同速度,人为的先前经验奖励很容易偏离最初任务的最佳趋同方向,这给RL空中战斗应用带来了极大的困难。在本文中,我们提议一种基于同质的软式软动作对峙方法(HSAC),该方法侧重于通过原始任务与稀薄的奖赏和人工先前经验奖赏的辅助任务之间的同质路径解决这些问题。本文也证明了这一方法的趋同性和可行性。为了切实地证实我们的方法,我们首先为基于RL的方法培训建立一个详细的3D空中战斗模拟环境,我们在攻击横向飞行UCAVAV任务和自我作用对抗任务中采用我们的方法。实验结果表明,我们的方法比在经过训练的飞行平均奖赏率达到98%时,我们的方法要好。