噪音空战环境的强化学习自玩和州级制备技术</s> (Reinforcement Learning Based Self-play and State Stacking Techniques for Noisy Air Combat Environment)

Reinforcement learning (RL) has recently proven itself as a powerful instrument for solving complex problems and even surpassed human performance in several challenging applications. This signifies that RL algorithms can be used in the autonomous air combat problem, which has been studied for many years. The complexity of air combat arises from aggressive close-range maneuvers and agile enemy behaviors. In addition to these complexities, there may be uncertainties in real-life scenarios due to sensor errors, which prevent estimation of the actual position of the enemy. In this case, autonomous aircraft should be successful even in the noisy environments. In this study, we developed an air combat simulation, which provides noisy observations to the agents, therefore, make the air combat problem even more challenging. Thus, we present a state stacking method for noisy RL environments as a noise reduction technique. In our extensive set of experiments, the proposed method significantly outperforms the baseline algorithms in terms of the winning ratio, where the performance improvement is even more pronounced in the high noise levels. In addition, we incorporate a self-play scheme to our training process by periodically updating the enemy with a frozen copy of the training agent. By this way, the training agent performs air combat simulations to an enemy with smarter strategies, which improves the performance and robustness of the agents. In our simulations, we demonstrate that the self-play scheme provides important performance gains compared to the classical RL training.

翻译：强化学习(RL)最近被证明是解决复杂问题的有力工具,甚至超越了人类在若干具有挑战性的应用中的性能。这意味着RL算法可以用于自主的空中战斗问题,这个问题已经研究多年了。空中战斗的复杂性产生于进攻性的近距离演习和灵活的敌人行为。除了这些复杂性外,由于感应错误,现实生活中可能存在不确定性,这妨碍了对敌人实际位置的估计。在这种情况下,即使在噪音环境中,自主飞机也应该成功。我们的研究中,我们开发了一个空中战斗模拟,向代理人提供噪音观测,从而使得空中战斗问题更具挑战性。因此,我们提出了一种为噪音的RL环境制造噪音的堆叠方法,作为一种减少噪音的技术。在我们广泛的实验中,拟议的方法在赢率方面大大超出了基线算法,因为那里的性能改进在高噪声水平上甚至更为明显。此外,我们还在训练过程中采用了一种自我游戏计划,用一个冷藏的培训代理人对敌人进行更新,从而使得空中战斗问题变得更具有挑战性。通过这种方式,我们的培训代理人将一个更精明的性模拟,我们用一种较精明的性的业绩模拟方法来显示我们较精锐的性的能力。</s>