Visual attention learning (VAL) aims to produce a confidence map as weights to detect discriminative features in each image for certain task such as vehicle re-identification (ReID) where the same vehicle instance needs to be identified across different cameras. In contrast to the literature, in this paper we propose utilizing self-supervised learning to regularize VAL to improving the performance for vehicle ReID. Mathematically using lifting we can factorize the two functions of VAL and self-supervised regularization through another shared function. We implement such factorization using a deep learning framework consisting of three branches: (1) a global branch as backbone for image feature extraction, (2) an attentional branch for producing attention masks, and (3) a self-supervised branch for regularizing the attention learning. Our network design naturally leads to an end-to-end multi-task joint optimization. We conduct comprehensive experiments on three benchmark datasets for vehicle ReID, i.e., VeRi-776, CityFlow-ReID, and VehicleID. We demonstrate the state-of-the-art (SOTA) performance of our approach with the capability of capturing informative vehicle parts with no corresponding manual labels. We also demonstrate the good generalization of our approach in other ReID tasks such as person ReID and multi-target multi-camera tracking.
翻译:视觉关注学习(VAL)旨在制作一张信任图,作为重量,用以检测每张图像中某些任务,如车辆重新识别(ReID)的差别特征,需要在不同相机中识别相同的车辆实例。与文献不同,我们在本文件中提议利用自我监督学习使VAL正规化,以提高车辆重新识别(ReID)的性能。从数学角度讲,我们可以通过另一个共享功能,将VERI和自我监督的正规化的两个功能考虑在内。我们利用由三个分支组成的深层次学习框架实施这种因素化:(1)全球分支,作为图像特征提取的骨干;(2)制作关注面罩的注意分支;(3)使注意力学习正规化的自我监督分支。我们的网络设计自然导致终端到终端的多任务联合优化。我们可以对车辆再识别(ReID)的三个基准数据集进行综合实验,即VeRi-776、CityFlow-ReID和SYestiveID。我们展示了我们方法的状态-艺术化表现,同时展示了我们获取信息化飞行器部分的能力,没有相应的多方向图象。