This paper addresses the problem of selecting appearance features for multiple object tracking (MOT) in urban scenes. Over the years, a large number of features has been used for MOT. However, it is not clear whether some of them are better than others. Commonly used features are color histograms, histograms of oriented gradients, deep features from convolutional neural networks and re-identification (ReID) features. In this study, we assess how good these features are at discriminating objects enclosed by a bounding box in urban scene tracking scenarios. Several affinity measures, namely the $\mathrm{L}_1$, $\mathrm{L}_2$ and the Bhattacharyya distances, Rank-1 counts and the cosine similarity, are also assessed for their impact on the discriminative power of the features. Results on several datasets show that features from ReID networks are the best for discriminating instances from one another regardless of the quality of the detector. If a ReID model is not available, color histograms may be selected if the detector has a good recall and there are few occlusions; otherwise, deep features are more robust to detectors with lower recall. The project page is http://www.mehdimiah.com/visual_features.
翻译:本文涉及在城市场景中选择多物体跟踪外观特征的问题。 多年来, MOT使用了大量外观特征。 但是, 不清楚其中某些特征是否优于其他特征。 通常使用的特征包括彩色直方图、 方向梯度直方图、 脉动神经网络和再识别( ReID) 特征的深度特征。 在本研究中, 我们评估这些特征在城市场景跟踪情景中被捆绑盒所封的物体中是如何歧视的。 一些亲近性措施, 即$\ mathrm{L ⁇ 1$、 $\mathrm{L ⁇ 2$和 Bhattacharyya 距离、 Rang-1 计数和 cosine 相似性, 也评估其对这些特征的偏差力的影响。 一些数据集的结果表明, 无论探测器的质量如何, 这些特征是最好的歧视事件。 如果没有 ReID 模型, 如果检测器具有较好的页面特征, 则可以选择其图画色 。 其它情况下, TRAD/ reclimation 。