Visible-infrared person re-identification (VI-ReID) aims to search the same pedestrian of interest across visible and infrared modalities. Existing models mainly focus on compensating for modality-specific information to reduce modality variation. However, these methods often lead to a higher computational overhead and may introduce interfering information when generating the corresponding images or features. To address this issue, it is critical to leverage pedestrian-attentive features and learn modality-complete and -consistent representation. In this paper, a novel Transferring Modality-Aware Pedestrian Attentive Learning (TMPA) model is proposed, focusing on the pedestrian regions to efficiently compensate for missing modality-specific features. Specifically, we propose a region-based data augmentation module PedMix to enhance pedestrian region coherence by mixing the corresponding regions from different modalities. A lightweight hybrid compensation module, i.e., the Modality Feature Transfer (MFT), is devised to integrate cross attention and convolution networks to fully explore the discriminative modality-complete features with minimal computational overhead. Extensive experiments conducted on the benchmark SYSU-MM01 and RegDB datasets demonstrated the effectiveness of our proposed TMPA model.
翻译:暂无翻译