We apply a reinforcement meta-learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile. The system is implemented as a policy that maps navigation system outputs directly to commanded rates of change for the missile's control surface deflections. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model, and demonstrate that the system adapts to both a large flight envelope and off-nominal flight conditions including perturbation of aerodynamic coefficient parameters and center of pressure locations, and flexible body dynamics. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction and imperfect seeker stabilization. We compare our system's performance to a longitudinal model of proportional navigation coupled with a three loop autopilot, and find that our system outperforms this benchmark by a large margin. Additional experiments investigate the impact of removing the recurrent layer from the policy and value function networks, performance with an infrared seeker, and flexible body dynamics.
翻译:我们采用强化的元学习框架,优化空对空导弹的综合适应性指导和飞行控制系统,该系统是作为一种政策而实施的,它将导航系统输出结果直接映射到导弹控制表面偏转的受控变化率。该系统诱导截取轨道,以对抗能够满足对鳍偏向角的控制限制的操纵目标,以及对视角和负载的路径限制。我们用六度自由模拟器测试优化系统,其中包括非线性雷达模型和捆绑式搜索器模型,并表明该系统既能适应大型飞行信封,又能适应非线性飞行条件,包括渗透空气动力系数参数参数和压力位置中心以及灵活的身体动态。此外,我们发现该系统对受辐射偏差和不完善的寻找者引发的寄生性态度循环十分强大。我们将我们的系统性能与配有三圈自动导航的长感性导航模型进行比较,并发现我们的系统在较大空间范围内超越了这一基准,包括空气动力系数参数和压力中心以及灵活的身体动态动态。我们通过进一步实验,寻求从经常层中消除其影响。