A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association. This pipeline is partially motivated by recent progress in both object detection and re-ID, and partially motivated by biases in existing tracking datasets, where most objects tend to have distinguishing appearance and re-ID models are sufficient for establishing associations. In response to such bias, we would like to re-emphasize that methods for multi-object tracking should also work when object appearance is not sufficiently discriminative. To this end, we propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation. As the dataset contains mostly group dancing videos, we name it "DanceTrack". We expect DanceTrack to provide a better platform to develop more MOT algorithms that rely less on visual discrimination and depend more on motion analysis. We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks. The dataset, project code and competition server are released at: \url{https://github.com/DanceTrack}.
翻译:用于多点跟踪的典型管道(MOT)是,在物体定位和物体关联的重新识别(再识别)之后,使用一个探测器进行物体定位。该管道的部分动力来自物体探测和再识别方面最近的进展,部分动力来自现有跟踪数据集的偏差,大多数物体往往具有明显的外观和再识别模式,足以建立协会。针对这种偏差,我们要再次强调,多点跟踪方法在物体外观不具有充分歧视性时也应发挥作用。为此,我们提议为多人跟踪提供大规模数据集,其中人类有相似的外观、不同运动和极端的表达方式。由于该数据集主要包含集体舞蹈录像,我们称之为“跳轨”。我们期待DanceTractrack提供更好的平台,以开发更多不那么依赖视觉歧视、更依赖运动分析的MOT算法。我们在数据集上以若干州级追踪器为基准,并观察到与现有基准相比在舞蹈跟踪上出现显著的性能下降。数据设置、项目代码和竞争服务器在以下释放:@urhurgrgrack/comps。