Tracking by detection is a common approach to solving the Multiple Object Tracking problem. In this paper we show how deep metric learning can be used to improve three aspects of tracking by detection. We train a convolutional neural network to learn an embedding function in a Siamese configuration on a large person re-identification dataset offline. It is then used to improve the online performance of tracking while retaining a high frame rate. We use this learned appearance metric to robustly build estimates of pedestrian's trajectories in the MOT16 dataset. In breaking with the tracking by detection model, we use our appearance metric to propose detections using the predicted state of a tracklet as a prior in the case where the detector fails. This method achieves competitive results in evaluation, especially among online, real-time approaches. We present an ablative study showing the impact of each of the three uses of our deep appearance metric.
翻译:检测跟踪是解决多物体跟踪问题的一个常见方法。 在本文中, 我们展示了如何利用深度计量学习来改进检测跟踪的三个方面。 我们训练了一个革命神经网络, 学习将一个嵌入功能嵌入到一个大型的人的阵形中, 一个大型的重新识别数据网下, 然后用来改进在线跟踪功能, 同时保留高框架率 。 我们使用这个学习外观指标, 以有力地构建 MOT16 数据集中行人轨迹的估计值 。 在与检测模型的跟踪方法决裂时, 我们使用我们的外观指标, 在探测器失灵的情况下, 以预知的轨迹状态 来建议检测 。 这种方法在评估中, 特别是在在线实时方法中, 取得竞争性的结果 。 我们提出一个模型研究, 显示我们深度外观指标的三个用途的影响 。