Existing studies for gait recognition are dominated by in-the-lab scenarios. Since people live in real-world senses, gait recognition in the wild is a more practical problem that has recently attracted the attention of the community of multimedia and computer vision. Current methods that obtain state-of-the-art performance on in-the-lab benchmarks achieve much worse accuracy on the recently proposed in-the-wild datasets because these methods can hardly model the varied temporal dynamics of gait sequences in unconstrained scenes. Therefore, this paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes. Concretely, we design a novel gait recognition network, named Multi-hop Temporal Switch Network (MTSGait), to learn spatial features and multi-scale temporal features simultaneously. Different from existing methods that use 3D convolutions for temporal modeling, our MTSGait models the temporal dynamics of gait sequences by 2D convolutions. By this means, it achieves high efficiency with fewer model parameters and reduces the difficulty in optimization compared with 3D convolution-based models. Based on the specific design of the 2D convolution kernels, our method can eliminate the misalignment of features among adjacent frames. In addition, a new sampling strategy, i.e., non-cyclic continuous sampling, is proposed to make the model learn more robust temporal features. Finally, the proposed method achieves superior performance on two public gait in-the-wild datasets, i.e., GREW and Gait3D, compared with state-of-the-art methods.
翻译:由于人们生活在现实世界的感觉中,野外的动作识别是一个更加实际的问题,最近吸引了多媒体和计算机视觉界的注意。当前在实验室基准中取得最先进的表现的方法,在最新提议的动态数据集中,其准确性要差得多,因为这些方法很难在不受控制的情况下模拟行踪序列的时际动态。因此,本文展示了一种新型的多点点点点时间开关方法,以在现实世界的场景中实现对行踪模式的有效时间建模。具体地说,我们设计了一个新颖的行踪识别网络,名为多点运动切换网络(MTSGait),以同时学习空间特征和多尺度时间特征。与现有方法使用3D变动模型进行时间识别,我们的MTGait模型模型无法用2D变幻场景模型来模拟行曲序列的时际动态。通过这个方法,它实现了高效率的模型参数,并减少了与3D变幻场场景模式进行实时模拟的难度。与3D变幻变幻机模型相比, 比较,在C型模型中, 具体地变动方法中可以消除。