《通过强化学习自动超音速击击武器终极适应性指导》 (Terminal Adaptive Guidance for Autonomous Hypersonic Strike Weapons via Reinforcement Learning)

An adaptive guidance system suitable for the terminal phase trajectory of a hypersonic strike weapon is optimized using reinforcement meta learning. The guidance system maps observations directly to commanded bank angle, angle of attack, and sideslip angle rates. Importantly, the observations are directly measurable from radar seeker outputs with minimal processing. The optimization framework implements a shaping reward that minimizes the line of sight rotation rate, with a terminal reward given if the agent satisfies path constraints and meets terminal accuracy and speed criteria. We show that the guidance system can adapt to off-nominal flight conditions including perturbation of aerodynamic coefficient parameters, actuator failure scenarios, sensor scale factor errors, and actuator lag, while satisfying heating rate, dynamic pressure, and load path constraints, as well as a minimum impact speed constraint. We demonstrate precision strike capability against a maneuvering ground target and the ability to divert to a new target, the latter being important to maximize strike effectiveness for a group of hypersonic strike weapons. Moreover, we demonstrate a threat evasion strategy against interceptors with limited midcourse correction capability, where the hypersonic strike weapon implements multiple diverts to alternate targets, with the last divert to the actual target. Finally, we include preliminary results for an integrated guidance and control system in a six degrees-of-freedom environment.

翻译：适用于超声击武器末期轨迹的适应性指导系统通过强化元学习加以优化。指导系统将观测结果直接映射到直线银行角、攻击角度和侧边斜角率。重要的是,观测结果直接从最小处理的雷达搜索者产出中测量到。优化框架实施影响性奖励,最大限度地降低视距旋转率线,如果该物剂满足路径限制并达到终端准确性和速度标准,则给予终极奖励。我们显示,指导系统可以适应超音速飞行条件,包括空气动力系数参数、动画故障假设、传感器因子错误和动画器落后,同时满足供暖率、动态压力和载荷路径限制,以及最低影响速度限制。我们展示了精确打击地面目标的能力,以及转向新目标的能力,后者对于使超声击武器群达到最大效果十分重要。此外,我们展示了一种威胁规避战略,以对付中程修正能力有限的拦截器,即超声波攻击武器将多次转向其他目标,最后转向实际目标,并在6度上显示一个综合环境。我们展示了一个初步结果。