Existing gait recognition methods either directly establish Global Feature Representation (GFR) from original gait sequences or generate Local Feature Representation (LFR) from several local parts. However, GFR tends to neglect local details of human postures as the receptive fields become larger in the deeper network layers. Although LFR allows the network to focus on the detailed posture information of each local region, it neglects the relations among different local parts and thus only exploits limited local information of several specific regions. To solve these issues, we propose a global-local based gait recognition network, named GaitGL, to generate more discriminative feature representations. To be specific, a novel Global and Local Convolutional Layer (GLCL) is developed to take full advantage of both global visual information and local region details in each layer. GLCL is a dual-branch structure that consists of a GFR extractor and a mask-based LFR extractor. GFR extractor aims to extract contextual information, e.g., the relationship among various body parts, and the mask-based LFR extractor is presented to exploit the detailed posture changes of local regions. In addition, we introduce a novel mask-based strategy to improve the local feature extraction capability. Specifically, we design pairs of complementary masks to randomly occlude feature maps, and then train our mask-based LFR extractor on various occluded feature maps. In this manner, the LFR extractor will learn to fully exploit local information. Extensive experiments demonstrate that GaitGL achieves better performance than state-of-the-art gait recognition methods. The average rank-1 accuracy on CASIA-B, OU-MVLP, GREW and Gait3D is 93.6%, 98.7%, 68.0% and 63.8%, respectively, significantly outperforming the competing methods. The proposed method has won the first prize in two competitions: HID 2020 and HID 2021.
翻译:现有轨迹识别方法要么直接从最初的步态序列中建立全球地貌代表(GFR),要么从几个地方部分产生地方地貌代表(LFR),然而,GFR往往忽视当地人类态势的详细细节,因为更深的网络层中接受的字段变得更大。虽然LFR允许网络关注每个地方的详细态势信息,但忽视了不同地方部分之间的关系,因此只利用了几个特定区域有限的当地信息。为了解决这些问题,我们提议建立一个全球地方地基的音频识别网络,名为GaitGL,以产生更多具有歧视性的特征。具体地说,一个新的全球和地方级级级级级的电动图(GLLLLL),以充分利用全球视觉信息,每个层次的局域。虽然GLLL是一个双权结构,由GFR的提取器组成,因此它只是利用了几个特定区域以面板块为基础的LFR 提取信息。例如,不同机体部分之间的关系和基于面体的LFR(GR)提取器将用来利用当地区域的详细态势变化图变换。