The main challenge of online multi-object tracking is to reliably associate object trajectories with detections in each video frame based on their tracking history. In this work, we propose the Recurrent Autoregressive Network (RAN), a temporal generative modeling framework to characterize the appearance and motion dynamics of multiple objects over time. The RAN couples an external memory and an internal memory. The external memory explicitly stores previous inputs of each trajectory in a time window, while the internal memory learns to summarize long-term tracking history and associate detections by processing the external memory. We conduct experiments on the MOT 2015 and 2016 datasets to demonstrate the robustness of our tracking method in highly crowded and occluded scenes. Our method achieves top-ranked results on the two benchmarks.
翻译:在线多对象跟踪的主要挑战是将物体轨迹与根据跟踪历史在每一视频框架中的探测结果可靠地联系起来。 在这项工作中,我们提议采用Octain Aut Regrestition Network(RAN),这是一个时间基因模型框架,用来描述多个物体的外观和运动动态。RAN将外部内存和内存结合起来。外部内存明确将每个轨迹的先前输入内容储存在时窗口中,而内部内存则通过处理外部内存来总结长期跟踪历史和相关探测结果。我们在2015年和2016年的MOT数据集上进行实验,以显示我们在高度拥挤和隐蔽的场景中跟踪方法的稳健性。我们的方法在两个基准上取得了最高级的成果。