Online multi-object tracking (MOT) is a longstanding task for computer vision and intelligent vehicle platform. At present, the main paradigm is tracking-by-detection, and the main difficulty of this paradigm is how to associate the current candidate detection with the historical tracklets. However, in the MOT scenarios, each historical tracklet is composed of an object sequence, while each candidate detection is just a flat image, which lacks the temporal features of the object sequence. The feature difference between current candidate detection and historical tracklets makes the object association much harder. Therefore, we propose a Spatial-Temporal Mutual {Representation} Learning (STURE) approach which learns spatial-temporal representations between current candidate detection and historical sequence in a mutual representation space. For the historical trackelets, the detection learning network is forced to match the representations of sequence learning network in a mutual representation space. The proposed approach is capable of extracting more distinguishing detection and sequence representations by using various designed losses in object association. As a result, spatial-temporal feature is learned mutually to reinforce the current detection features, and the feature difference can be relieved. To prove the robustness of the STURE, it is applied to the public MOT challenge benchmarks and performs well compared with various state-of-the-art online MOT trackers based on identity-preserving metrics.
翻译:在线多目标跟踪(MOT)是计算机视觉和智能飞行器平台的长期任务。 目前,主要范例是逐项跟踪,而这一范例的主要困难在于如何将当前候选人的探测与历史跟踪联系起来。 然而,在MOT情景中,每个历史轨迹都由物体序列组成,而每个候选人的探测只是平坦的图像,缺乏物体序列的时间特征。当前候选人检测和历史轨迹之间的特征差异使得对象关联更加困难。因此,我们提议采用空间-时际相互演示学习(SturE)方法,在共同代表空间中学习当前候选人检测和历史序列之间的空间时际表达。对于历史轨迹而言,探测学习网络被迫在一个共同代表空间中匹配序列学习网络的表达。拟议方法能够利用目标关联中各种设计损失来更区分探测和序列表达。因此,空间-时际特征可以相互学习,以加强当前检测特征,而特征差异可以缓解。要证明STURTOR 的在线定位基准与STURL 的稳健性,而测试STUR TUR 基准与STOR 的在线基准比对STOR 。