We use reinforcement meta learning to optimize a line of sight curvature policy that increases the effectiveness of a guidance system against maneuvering targets. The policy is implemented as a recurrent neural network that maps navigation system outputs to a Euler 321 attitude representation. The attitude representation is then used to construct a direction cosine matrix that biases the observed line of sight vector. The line of sight rotation rate derived from the biased line of sight is then mapped to a commanded acceleration by the guidance system. By varying the bias as a function of navigation system outputs, the policy enhances accuracy against highly maneuvering targets. Importantly, our method does not require an estimate of target acceleration. In our experiments, we demonstrate that when our method is combined with proportional navigation, the system significantly outperforms augmented proportional navigation with perfect knowledge of target acceleration, achieving improved accuracy with less control effort against a wide range of target maneuvers.
翻译:我们用强化元学习优化视线曲线政策,提高导航系统对操纵目标的效果。该政策作为经常性神经网络实施,将导航系统输出成Euler 321姿态表示。然后,姿态表示用于构建一个偏向观测到的视线矢量的方向连线矩阵。从偏向视线得出的视线旋转率线随后通过导航系统绘制成一个命令加速线。通过将偏差作为导航系统产出的函数,该政策提高了对高度操纵目标的精确度。重要的是,我们的方法不需要对目标加速率作出估计。在我们的实验中,我们证明当我们的方法与比例导航相结合时,系统明显超过比例导航,完全了解目标加速率,在对广泛目标动作的控制性较弱的情况下提高精确度。