Active tracking of space noncooperative object that merely relies on vision camera is greatly significant for autonomous rendezvous and debris removal. Considering its Partial Observable Markov Decision Process (POMDP) property, this paper proposes a novel tracker based on deep recurrent reinforcement learning, named as RAMAVT which drives the chasing spacecraft to follow arbitrary space noncooperative object with high-frequency and near-optimal velocity control commands. To further improve the active tracking performance, we introduce Multi-Head Attention (MHA) module and Squeeze-and-Excitation (SE) layer into RAMAVT, which remarkably improve the representative ability of neural network with almost no extra computational cost. Extensive experiments and ablation study implemented on SNCOAT benchmark show the effectiveness and robustness of our method compared with other state-of-the-art algorithm. The source codes are available on https://github.com/Dongzhou-1996/RAMAVT.
翻译:考虑到其部分可观测的Markov决定程序(POMDP)属性,本文件提议基于深度的经常性强化学习(称为RAMAVT)建立一个新型跟踪器,该跟踪器推动追击航天器以高频和近最佳速度控制指令跟踪任意的空间不合作物体。为了进一步改善主动跟踪性性能,我们向RAMAVT引入多负责人注意模块和挤压-Excistry(SE)层,这明显提高了神经网络的代表性能力,几乎没有额外的计算成本。在SNCOAT基准上实施的广泛实验和调节研究显示,我们的方法与其他最先进的算法相比是有效的和稳健的。源代码见https://github.com/Dongzou 1996-1996/RAMAVT。