Weakly supervised person search aims to perform joint pedestrian detection and re-identification (re-id) with only person bounding-box annotations. Recently, the idea of contrastive learning is initially applied to weakly supervised person search, where two common contrast strategies are memory-based contrast and intra-image contrast. We argue that current intra-image contrast is shallow, which suffers from spatial-level and occlusion-level variance. In this paper, we present a novel deep intra-image contrastive learning using a Siamese network. Two key modules are spatial-invariant contrast (SIC) and occlusion-invariant contrast (OIC). SIC performs many-to-one contrasts between two branches of Siamese network and dense prediction contrasts in one branch of Siamese network. With these many-to-one and dense contrasts, SIC tends to learn discriminative scale-invariant and location-invariant features to solve spatial-level variance. OIC enhances feature consistency with the masking strategy to learn occlusion-invariant features. Extensive experiments are performed on two person search datasets CUHK-SYSU and PRW, respectively. Our method achieves a state-of-the-art performance among weakly supervised one-step person search approaches. We hope that our simple intra-image contrastive learning can provide more paradigms on weakly supervised person search. The source code is available at \url{https://github.com/jiabeiwangTJU/DICL}.
翻译:受微弱监督的人搜索旨在进行联合行人探测和重新定位(重新定位),只有人带框注解。最近,对比学习的概念最初适用于监督不力的人搜索,其中两个共同对比战略是记忆对比和图像内部对比。我们争辩说,目前的图像内部对比是浅的,有空间水平和封闭水平差异。在本文中,我们展示了使用Siamse网络的新型深层图像内部对比学习。两个关键模块是空间异差对比(SIC)和隐性异性对比(OIC)。SIC在暹米网络的两个分支之间进行了多次对一对比,而在暹米网络的一个分支之间则进行了密集的预测对比。由于这些多到一和密集的对比,SIC倾向于学习歧视性规模和位置异性特征,以解决空间水平差异。OIS与掩码战略的特征一致性,以学习隐性差异特征。在两个人的搜索模式上进行广泛的实验,在Sia-CU-SU-SI上分别进行简单的个人搜索。