Most current multi-object trackers focus on short-term tracking, and are based on deep and complex systems that often cannot operate in real-time, making them impractical for video-surveillance. In this paper we present a long-term, multi-face tracking architecture conceived for working in crowded contexts where faces are often the only visible part of a person. Our system benefits from advances in the fields of face detection and face recognition to achieve long-term tracking, and is particularly unconstrained to the motion and occlusions of people. It follows a tracking-by-detection approach, combining a fast short-term visual tracker with a novel online tracklet reconnection strategy grounded on rank-based face verification. The proposed rank-based constraint favours higher inter-class distance among tracklets, and reduces the propagation of errors due to wrong reconnections. Additionally, a correction module is included to correct past assignments with no extra computational cost. We present a series of experiments introducing novel specialized metrics for the evaluation of long-term tracking capabilities, and publicly release a video dataset with 10 manually annotated videos and a total length of 8' 54". Our findings validate the robustness of each of the proposed modules, and demonstrate that, in these challenging contexts, our approach yields up to 50% longer tracks than state-of-the-art deep learning trackers.
翻译:目前大多数多弹道跟踪器都侧重于短期跟踪,并且基于往往无法实时运行的深层次和复杂的系统,使视频监视系统不切实际。在本文中,我们提出了一个长期、多面跟踪结构,目的是在人的唯一可见部分,在拥挤的环境中工作,面部往往是人的唯一可见部分。我们的系统受益于面部检测和面部识别领域的进展,以获得长期跟踪,特别不受人员运动和隔离的限制。它遵循一种跟踪和观察方法,结合一个快速短期视觉跟踪器,与一个新的在线跟踪器重新连接战略相结合,以按等级进行面部核查为根据。我们提出的基于等级的在线追踪战略,基于等级的限制有利于在轨迹之间保持更高的跨级距离,并减少错误的传播,因为错误的重新连接往往是一个人的唯一可见部分。此外,我们的系统还包括一个修正模块,以纠正过去的任务,而不增加计算成本。我们提出了一系列实验,介绍评估长期跟踪能力的新的专门指标,并公开发布一个视频数据集,配有10个手动视频,以及8'54“全程”的在线连接战略。我们提出的50个分析模型,用以验证我们每个学习过程的坚固度。