Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras. However, there exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods. To address this issue, we present a simple re-Detect and Link (DL) module which can effectively reduce those unexpected noise through applying the deep learning-based detection and tracking on the cropped tracklets. Furthermore, we introduce an improved model called Coarse-to-Fine Axial-Attention Network (CF-AAN). Based on the typical Non-local Network, we replace the non-local module with three 1-D position-sensitive axial attentions, in addition to our proposed coarse-to-fine structure. With the developed CF-AAN, compared to the original non-local operation, we can not only significantly reduce the computation cost but also obtain the state-of-the-art performance (91.3% in rank-1 and 86.5% in mAP) on the large-scale MARS dataset. Meanwhile, by simply adopting our DL module for data alignment, to our surprise, several baseline models can achieve better or comparable results with the current state-of-the-arts. Besides, we discover the errors not only for the identity labels of tracklets but also for the evaluation protocol for the test data of MARS. We hope that our work can help the community for the further development of invariant representation without the hassle of the spatial and temporal alignment and dataset noise. The code, corrected labels, evaluation protocol, and the aligned data will be available at https://github.com/jackie840129/CF-AAN.
翻译:以视频为基础的个人再定位(Re-ID)旨在将视频轨迹与用不同摄像头识别行人所需的带宽视频框架相匹配;然而,由于对过时方法产生的检测和跟踪结果不完善,这些作物轨迹在空间和时间上存在严重的不匹配现象;为解决这一问题,我们提出了一个简单的重新检测和链接模块,通过在作物轨迹上应用基于深学习的检测和跟踪,可以有效减少这些出乎意料的噪音。此外,我们引入了一个改进的模型,称为Coarse-Fine Axial-Astening网络(CF-AAN)。基于典型的非本地网络,我们用对位置敏感的3个对过时方法产生的检测和跟踪结果来取代非本地模块。 与最初的非本地操作相比,我们开发的CFM-AAN模块不仅可以大幅降低计算成本,而且只能获得最新水平的运行状态(91.3 %的级/级/级/级网络网络网络网络 ) 。基于典型的非本地网络网络网络网络,我们用3D-CRiscoil 数据可以实现我们当前协议的更精确的升级的数据。