Learning representative, robust and discriminative information from images is essential for effective person re-identification (Re-Id). In this paper, we propose a compound approach for end-to-end discriminative deep feature learning for person Re-Id based on both body and hand images. We carefully design the Local-Aware Global Attention Network (LAGA-Net), a multi-branch deep network architecture consisting of one branch for spatial attention, one branch for channel attention, one branch for global feature representations and another branch for local feature representations. The attention branches focus on the relevant features of the image while suppressing the irrelevant backgrounds. In order to overcome the weakness of the attention mechanisms, equivariant to pixel shuffling, we integrate relative positional encodings into the spatial attention module to capture the spatial positions of pixels. The global branch intends to preserve the global context or structural information. For the the local branch, which intends to capture the fine-grained information, we perform uniform partitioning to generate stripes on the conv-layer horizontally. We retrieve the parts by conducting a soft partition without explicitly partitioning the images or requiring external cues such as pose estimation. A set of ablation study shows that each component contributes to the increased performance of the LAGA-Net. Extensive evaluations on four popular body-based person Re-Id benchmarks and two publicly available hand datasets demonstrate that our proposed method consistently outperforms existing state-of-the-art methods.
翻译:从图像中获取有代表性的、强有力的和歧视性的信息,对于有效的人重新识别(Re-Id)至关重要。在本文件中,我们建议对基于身体和手动图像的人进行端到端的有区别的深特征学习采取复合方法。我们仔细设计了本地软件全球注意网(LAGA-Net),这是一个多分支的深网络架构,由空间关注的一个分支、频道关注的一个分支、全球地貌展示的一个分支以及地方地貌表现的另一个分支组成。关注分支侧重于图像的相关特征,同时压制不相关背景。为了克服关注机制的弱点,我们把相对的位置编码与像素打乱,我们把相对的位置编码纳入空间注意模块,以捕捉像素的空间位置。全球软件处打算保护全球背景或结构信息。对于打算捕捉精细信息的地方分支,我们执行统一的分区,以生成同级横向图像的条纹。我们通过进行软分隔,而不明确地将图像与像素拼接的变换,或者要求将外部定位编码纳入空间关注模块模块的模型,以显示我们现有的两种基于纸面结构的现有数据估测。