We address the problem of visible-infrared person re-identification (VI-reID), that is, retrieving a set of person images, captured by visible or infrared cameras, in a cross-modal setting. Two main challenges in VI-reID are intra-class variations across person images, and cross-modal discrepancies between visible and infrared images. Assuming that the person images are roughly aligned, previous approaches attempt to learn coarse image- or rigid part-level person representations that are discriminative and generalizable across different modalities. However, the person images, typically cropped by off-the-shelf object detectors, are not necessarily well-aligned, which distract discriminative person representation learning. In this paper, we introduce a novel feature learning framework that addresses these problems in a unified way. To this end, we propose to exploit dense correspondences between cross-modal person images. This allows to address the cross-modal discrepancies in a pixel-level, suppressing modality-related features from person representations more effectively. This also encourages pixel-wise associations between cross-modal local features, further facilitating discriminative feature learning for VI-reID. Extensive experiments and analyses on standard VI-reID benchmarks demonstrate the effectiveness of our approach, which significantly outperforms the state of the art.
翻译:我们解决了可见红外人重新识别(VI-reID)的问题,即:在跨模式环境下检索一套通过可见或红外摄像头拍摄的、由可见或红外摄像头拍摄的人图像。VI-reID的两个主要挑战是:个人图像之间的阶级内部变化,以及可见和红外图像之间的交叉模式差异。假设个人图像大致一致,以前的做法是试图学习具有歧视性和可跨越不同模式的粗化图像或僵硬的半层次人形象。然而,个人图像,通常由现成的物体探测器所制作的,不一定完全吻合,从而分散了歧视性的代言学习。在本文件中,我们引入了一个新的特征学习框架,以统一的方式解决这些问题。为此,我们提议利用跨模式图像之间密集的对应关系。这可以解决像素层面的交叉模式差异,从而更有效地抑制与模式有关的特征。这也鼓励了跨模式的本地特征之间的像素联系,从而进一步便利了歧视性特征学习,从而极大地展示了我们VI-ID标准分析的状态。