While the Vision Transformer has been used in gait recognition, its application in multi-view gait recognition is still limited. Different views significantly affect the extraction and identification accuracy of the characteristics of gait contour. To address this, this paper proposes a Siamese Mobile Vision Transformer (SMViT). This model not only focuses on the local characteristics of the human gait space but also considers the characteristics of long-distance attention associations, which can extract multi-dimensional step status characteristics. In addition, it describes how different perspectives affect gait characteristics and generate reliable perspective feature relationship factors. The average recognition rate of SMViT on the CASIA B data set reached 96.4%. The experimental results show that SMViT can attain state-of-the-art performance compared to advanced step recognition models such as GaitGAN, Multi_view GAN, Posegait and other gait recognition models.
翻译:虽然在动作识别中使用了视觉变异器,但其在多视角动作识别中的应用仍然有限,不同观点对动作轮廓特征的提取和识别准确性产生了重大影响。为解决这一问题,本文件提议使用一个Simese移动视野变异器(SMVIT),该模型不仅侧重于人行距空间的本地特征,而且还考虑到长距离关注协会的特征,它们可以提取多维步态状态特征。此外,它描述了不同观点如何影响动作特征并产生可靠的视觉特征关系因素。CASIA B数据集SMViT的平均识别率达到了96.4%。实验结果表明,SMVIT与高级步骤识别模型相比,如GaitGAN、Multi_view GAN、Posegait和其他动作识别模型,能够达到最先进的性能。