Thanks for the cross-modal retrieval techniques, visible-infrared (RGB-IR) person re-identification (Re-ID) is achieved by projecting them into a common space, allowing person Re-ID in 24-hour surveillance systems. However, with respect to the probe-to-gallery, almost all existing RGB-IR based cross-modal person Re-ID methods focus on image-to-image matching, while the video-to-video matching which contains much richer spatial- and temporal-information remains under-explored. In this paper, we primarily study the video-based cross-modal person Re-ID method. To achieve this task, a video-based RGB-IR dataset is constructed, in which 927 valid identities with 463,259 frames and 21,863 tracklets captured by 12 RGB/IR cameras are collected. Based on our constructed dataset, we prove that with the increase of frames in a tracklet, the performance does meet more enhancement, demonstrating the significance of video-to-video matching in RGB-IR person Re-ID. Additionally, a novel method is further proposed, which not only projects two modalities to a modal-invariant subspace, but also extracts the temporal-memory for motion-invariant. Thanks to these two strategies, much better results are achieved on our video-based cross-modal person Re-ID. The code and dataset are released at: https://github.com/VCMproject233/MITML.
翻译:感谢跨模式检索技术,可见红外线(RGB-IR)人员重新定位(Re-ID)是通过将其投射到一个共同空间来实现的,允许人们在24小时监视系统中重新识别,然而,关于探测到画室,几乎所有现有的基于RGB-IR的跨模式人重新识别方法都侧重于图像到图像的匹配,而包含更丰富的空间和时间信息的视频到视频匹配(RGB-IR)人员重新识别(Re-ID)仍然处于探索状态。在本文中,我们主要研究基于视频的跨模式人重新识别(Re-ID)方法。为了完成这项任务,建立了一个基于视频的RGB-IR数据集,其中收集了由12 RGB/IR相机拍摄的927个有效身份,463,259个框架和21,863个跟踪器。根据我们构建的数据集,我们证明随着基于轨道框架的扩大,业绩仍然处于更大的提升状态,表明视频到RGB-IR人重新识别-ID的视频匹配的意义。此外,一个新的方法在移动到两个模型中,这些模型只是两个模型。