In this paper, we are interested in learning a generalizable person re-identification (re-ID) representation from unlabeled videos. Compared with 1) the popular unsupervised re-ID setting where the training and test sets are typically under the same domain, and 2) the popular domain generalization (DG) re-ID setting where the training samples are labeled, our novel scenario combines their key challenges: the training samples are unlabeled, and collected form various domains which do no align with the test domain. In other words, we aim to learn a representation in an unsupervised manner and directly use the learned representation for re-ID in novel domains. To fulfill this goal, we make two main contributions: First, we propose Cycle Association (CycAs), a scalable self-supervised learning method for re-ID with low training complexity; and second, we construct a large-scale unlabeled re-ID dataset named LMP-video, tailored for the proposed method. Specifically, CycAs learns re-ID features by enforcing cycle consistency of instance association between temporally successive video frame pairs, and the training cost is merely linear to the data size, making large-scale training possible. On the other hand, the LMP-video dataset is extremely large, containing 50 million unlabeled person images cropped from over 10K Youtube videos, therefore is sufficient to serve as fertile soil for self-supervised learning. Trained on LMP-video, we show that CycAs learns good generalization towards novel domains. The achieved results sometimes even outperform supervised domain generalizable models. Remarkably, CycAs achieves 82.2\% Rank-1 on Market-1501 and 49.0\% Rank-1 on MSMT17 with zero human annotation, surpassing state-of-the-art supervised DG re-ID methods. Moreover, we also demonstrate the superiority of CycAs under the canonical unsupervised re-ID and the pretrain-and-finetune scenarios.
翻译:在本文中,我们有兴趣从未贴标签的视频中学习一个通用的人重新定位(re-ID) 。 与 1 相比的是, 流行且不受监督的再识别设置, 培训和测试组通常在同一域内, 而2 流行的域通用(DG) 重新识别设置, 将培训样本贴上标签, 我们的新设想结合了它们的关键挑战: 培训样本没有标签, 收集了与测试域不相符的多个域。 换句话说, 我们的目标是以不受监督的方式学习一个可通用的人重新识别( re-ID) 。 换句话说, 我们的目标是在新域内直接使用所学的再识别显示。 为了实现这一目标, 我们提出循环协会( Cy- A) (Cy- A), 一个可升级的自我监控视频集成( Dy), 一个可升级的自我检测方法, 用于再升级为10个通用方法。 因此, Cyal- dal- deal- deal- developal- real- develop 。