Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively small and shallow neural networks to extract subtle gait features, achieving impressive successes in indoor settings. Nevertheless, experiments revealed that these existing methods mostly produce unsatisfactory results when applied to newly released in-the-wild gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Consequently, we emphasize the importance of suitable network capacity, explicit temporal modeling, and deep transformer structure for discriminative gait representation learning. Our proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance gains in outdoor scenarios, \textit{e.g.}, about +30\% rank-1 accuracy compared with many state-of-the-art methods on the challenging GREW dataset. This work is expected to further boost the research and application of gait recognition. Code will be available at https://github.com/ShiqiYu/OpenGait.
翻译:Gait 识别是远距离个人识别的一个快速进步的视觉技术。 先前的研究主要使用相对小和浅的神经网络,以提取细微的动作特征,在室内环境中取得令人印象深刻的成功。 然而,实验显示,这些现有方法在应用于新推出的单向行走数据集时,大多产生不满意的结果。 本文提出了一个统一的观点,探讨如何构建最先进的户外识别模型,包括古典CNN和新兴的以变异器为基础的结构。 因此,我们强调适当的网络能力、明确的时间模型和深层变异器结构对于歧视性动作代表学习的重要性。 我们提议的以CNN为基础的DeepGaitV2系列和以变异器为基础的SwinGait系列在户外情景中展示了显著的性能收益,\ textitit{e.},与具有挑战性的GROEW数据集上的许多最先进的方法相比,大约是+30<unk> -1级的准确度。 这项工作可望进一步推动对GOit 识别的研究和应用。 守则将在 https://github.com/ ShiqiYu/ Ostoina.</s>