Existing unsupervised person re-identification methods only rely on visual clues to match pedestrians under different cameras. Since visual data is essentially susceptible to occlusion, blur, clothing changes, etc., a promising solution is to introduce heterogeneous data to make up for the defect of visual data. Some works based on full-scene labeling introduce wireless positioning to assist cross-domain person re-identification, but their GPS labeling of entire monitoring scenes is laborious. To this end, we propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling, in which we only need to know the locations of the cameras. Specifically, we propose a novel unsupervised multimodal training framework (UMTF), which models the complementarity of visual data and wireless information. Our UMTF contains a multimodal data association strategy (MMDA) and a multimodal graph neural network (MMGN). MMDA explores potential data associations in unlabeled multimodal data, while MMGN propagates multimodal messages in the video graph based on the adjacency matrix learned from histogram statistics of wireless data. Thanks to the robustness of the wireless data to visual noise and the collaboration of various modules, UMTF is capable of learning a model free of the human label on data. Extensive experimental results conducted on two challenging datasets, i.e., WP-ReID and DukeMTMC-VideoReID demonstrate the effectiveness of the proposed method.
翻译:现有的无监督人员重识别方法仅依靠视觉线索来匹配不同摄像头下的行人。由于视觉数据本质上容易受到遮挡、模糊、服装变化等因素的影响,引入异构数据来弥补视觉数据的缺陷是一个有希望的解决方案。一些基于全场景标注的方法引入了无线定位来辅助跨域人员重识别,但其对整个监控场景的 GPS 标注很费力。为此,我们提出了一种在弱场景标注下利用视觉数据和无线定位轨迹进行无监督人员重识别的方法,在此方法中我们仅需要知道摄像头的位置。具体地,我们提出了一种新的无监督多模态训练框架(UMTF),它建模了视觉数据和无线信息的补充性。我们的 UMTF 包含一个多模态数据关联策略(MMDA)和一个多模态图神经网络(MMGN)。MMDA 探索未标记多模态数据中的潜在数据关联,而 MMGN 基于从无线数据的直方图统计学习到的邻接矩阵在视频图上传播多模态信息。由于无线数据对视觉噪声的稳健性以及各个模块的协作,UMTF 能够学习在数据上无需人工标签的模型。在 WP-ReID 和 DukeMTMC-VideoReID 两个具有挑战性的数据集上进行的广泛实验结果表明了所提出方法的有效性。