Systems for person re-identification (ReID) can achieve a high accuracy when trained on large fully-labeled image datasets. However, the domain shift typically associated with diverse operational capture conditions (e.g., camera viewpoints and lighting) may translate to a significant decline in performance. This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID - a relevant scenario that is less explored in the literature. In this scenario, the ReID model must adapt to a complex target domain defined by a network of diverse video cameras based on tracklet information. State-of-art methods cluster unlabeled target data, yet domain shifts across target cameras (sub-domains) can lead to poor initialization of clustering methods that propagates noise across epochs, thus preventing the ReID model to accurately associate samples of same identity. In this paper, an UDA method is introduced for video person ReID that leverages knowledge on video tracklets, and on the distribution of frames captured over target cameras to improve the performance of CNN backbones trained using pseudo-labels. Our method relies on an adversarial approach, where a camera-discriminator network is introduced to extract discriminant camera-independent representations, facilitating the subsequent clustering. In addition, a weighted contrastive loss is proposed to leverage the confidence of clusters, and mitigate the risk of incorrect identity associations. Experimental results obtained on three challenging video-based person ReID datasets - PRID2011, iLIDS-VID, and MARS - indicate that our proposed method can outperform related state-of-the-art methods. Our code is available at: \url{https://github.com/dmekhazni/CAWCL-ReID}
翻译:个人再识别系统(ReID)在接受大规模全标签图像数据集培训时,可以实现高精度。然而,通常与不同操作性捕获条件(例如相机视图和照明)相关的域变通常会转化为性能显著下降。本文侧重于视频ReID(UDA)的不受监督域变换(UDA),这是文献中较少探讨的一种相关情景。在这种情况下,ReID模式必须适应一个复杂的目标领域,由基于跟踪信息的不同摄像头组成的网络所定义。 状态-艺术方法组群未标签目标数据,但域变换目标摄像头(次域域名)通常会导致不同操作性捕获条件(例如相机视图和照明)的域变换率方法差,从而防止ReID模型与基于视频的ReID(UDA)相关样本发生准确的联系。在本文中,为利用视频轨迹知识的视频实时摄像头,以及利用假标签培训的CNN骨架的分布框架。我们的方法依赖于一种对抗性D-D(次域图)方法,在这个工具中,促进内部的图像-图像流流变换的图像服务器服务器服务器服务器网络。提议采用一个具有挑战性的数据缩缩图。