Multi-object tracking in sports scenes plays a critical role in gathering players statistics, supporting further analysis, such as automatic tactical analysis. Yet existing MOT benchmarks cast little attention on the domain, limiting its development. In this work, we present a new large-scale multi-object tracking dataset in diverse sports scenes, coined as \emph{SportsMOT}, where all players on the court are supposed to be tracked. It consists of 240 video sequences, over 150K frames (almost 15\times MOT17) and over 1.6M bounding boxes (3\times MOT17) collected from 3 sports categories, including basketball, volleyball and football. Our dataset is characterized with two key properties: 1) fast and variable-speed motion and 2) similar yet distinguishable appearance. We expect SportsMOT to encourage the MOT trackers to promote in both motion-based association and appearance-based association. We benchmark several state-of-the-art trackers and reveal the key challenge of SportsMOT lies in object association. To alleviate the issue, we further propose a new multi-object tracking framework, termed as \emph{MixSort}, introducing a MixFormer-like structure as an auxiliary association model to prevailing tracking-by-detection trackers. By integrating the customized appearance-based association with the original motion-based association, MixSort achieves state-of-the-art performance on SportsMOT and MOT17. Based on MixSort, we give an in-depth analysis and provide some profound insights into SportsMOT. The dataset and code will be available at https://deeperaction.github.io/datasets/sportsmot.html.
翻译:多目标跟踪在体育场景中发挥着至关重要的作用,它可以收集运动员统计数据,并支持进一步的分析,例如自动战术分析。然而,现有的多目标跟踪基准测试对该领域的关注较少,限制了其发展。在本文中,我们提出了一个新的大规模多目标跟踪数据集,包括多种不同的体育场景,被称为 "SportsMOT ",其中应该跟踪场上的所有球员。它由 240 个视频序列组成,超过 15\times MOT17,超过 150K 帧(几乎是 MOT17 的 15 倍)和超过 1.6M 边框(3\times MOT17)从 3 个体育类别,包括篮球,排球和足球。我们的数据集具有两个关键属性:1)快速和变速运动;2)类似但可区分的外观。我们希望 SportsMOT 可以鼓励 MOT 跟踪器在基于运动关联和基于外观关联方面的促进。我们基准测试了几种最先进的跟踪器,并揭示了 SportsMOT 的主要挑战在于目标关联。为了缓解这个问题,我们进一步提出了一个新的多目标跟踪框架,称为 "MixSort ",引入了 MixFormer 类似结构作为一个辅助关联模型到流行的基于检测的跟踪器。通过将定制化的基于外观的关联与原始的基于运动的关联相结合,MixSort 在 SportsMOT 和 MOT17 上实现了最先进的性能。基于 MixSort,我们进行了深入的分析,并提供了一些深刻的见解到 SportsMOT。数据集和代码将在 https://deeperaction.github.io/datasets/sportsmot.html 上提供。