通过强化学习对内层大气导弹进行整合和适应性指导和控制 (Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning)

We apply the meta reinforcement learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile, implementing the system as a deep neural network (the policy). The policy maps observations directly to commanded rates of change for the missile's control surface deflections, with the observations derived with minimal processing from the computationally stabilized line of sight unit vector measured by a strap down seeker, estimated rotational velocity from rate gyros, and control surface deflection angles. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model. Through extensive simulation, we demonstrate that the system can adapt to a large flight envelope and off nominal flight conditions that include perturbation of aerodynamic coefficient parameters and center of pressure locations. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction, imperfect seeker stabilization, and sensor scale factor errors. Finally, we compare our system's performance to two benchmarks: a proportional navigation guidance system benchmark in a simplified 3-DOF environment, which we take as an upper bound on performance attainable with separate guidance and flight control systems, and a longitudinal model of proportional navigation coupled with a three loop autopilot. We find that our system moderately outperforms the former, and outperforms the latter by a large margin.

翻译：我们运用元加强学习框架,优化空对空导弹的综合适应性指导和飞行控制系统,将该系统作为深神经网络(政策)加以实施。政策地图观测直接用于导弹控制表面偏转的指令性变化率,其观测以最小的处理方式来自由带带式搜索器测量的计算稳定线的视控单位矢量、来自速率陀螺仪的估计旋转速度以及控制地表偏转角度。这个系统诱使截取轨迹和飞行控制系统,以适应对角偏移角度的控制限制和视角和负载路径限制的操纵目标。我们用六度自由度模拟导弹控制表面偏移的模拟器直接测试优化系统,其中包括非线性拉子模型和带式搜索模型。通过广泛的模拟,我们证明这个系统可以适应大型飞行包包和表面飞行条件,包括空气动力学系数参数和压力中心受到干扰。此外,我们发现这个系统能够强大地适应由角偏移系统反偏移的偏移和视线角角角角度限制,我们用六度自由度模拟模拟系统测试优化的系统,其中含有非线性自由度模型的模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型和后向后向后,最后,我们用三度的系统进行测试。我们用三度向后向后标定的精确性能比,最后的系统,我们用来比较了一个系统,我们比较的轨道的轨道定位的系统,最后的系统,我们比较了一个比,我们的轨道的系统,我们的轨道定位的轨道定位的轨道定位,我们的轨道定位,我们的轨道定位的系统,我们用一个比,最后用一个比的轨道定位的系统,我们用来测量的轨道定位的轨道定位的系统,我们用一个比,最后的轨道定位的精确定位的系统,我们用一个比, 的轨道,我们用在后制,我们用一个比,我们用在后向后制,我们用一个比的系统,最后的系统,我们用一个比, 的精确测测测测测测测测测测测测测的轨道,最后的系统,我们用一个比,用一个比的轨道,用一个比的轨道,最后的系统,用一个比的轨道,用一个比, 的系统,用