This work focuses on player re-identification in broadcast videos of team sports. Specifically, we focus on identifying the same player in images captured from different camera viewpoints during any given moment of a match. This task differs from traditional applications of person re-id in a few important ways. Firstly, players from the same team wear highly similar clothes, thereby making it harder to tell them apart. Secondly, there are only a few number of samples for each identity, which makes it harder to train a re-id system. Thirdly, the resolutions of the images are often quite low and vary a lot. This combined with heavy occlusions and fast movements of players greatly increase the challenges for re-id. In this paper, we propose a simple but effective hierarchical data sampling procedure and a centroid loss function that, when used together, increase the mean average precision (mAP) by 7 - 11.5 and the rank-1 (R1) by 8.8 - 14.9 without any change in the network or hyper-parameters used. Our data sampling procedure improves the similarity of the training and test distributions, and thereby aids in creating better estimates of the centroids of the embeddings (or feature vectors). Surprisingly, our study shows that in the presence of severely limited data, as is the case for our application, a simple centroid loss function based on euclidean distances significantly outperforms the popular triplet-centroid loss function. We show comparable improvements for both convolutional networks and vision transformers. Our approach is among the top ranked methods in the SoccerNet Re-Identification Challenge 2022 leaderboard (test-split) with a mAP of 86.0 and a R1 of 81.5. On the sequestered challenge split, we achieve an mAP of 84.9 and a R1 of 80.1. Research on re-id for sports-related applications is very limited and our work presents one of the first discussions in the literature on this.
翻译:这项工作侧重于播放团队体育视频中的播放器重新定位。 具体地说, 我们侧重于在任何匹配时刻从不同摄像角度拍摄的不同图像中识别相同的播放器。 这项任务与传统的人重新定位应用有不同。 首先, 同一团队的玩家穿着非常相似的服装, 因而更难分辨他们。 第二, 每个身份的样本数量不多, 使得更难培训一个重置系统。 第三, 图像的分辨率往往非常低, 差异很大 。 这加上玩家的高度隐蔽和快速移动, 大大增加了重新定位的挑战。 在本文中, 我们提议一个简单有效的等级数据取样数据采集程序, 当他们一起使用时, 将平均精度( mAP) 提高7 - 11.5, 将1级( R1 ) 提高8. 8 - 14. 9, 而没有使用过任何网络或超分数的系统。 我们的数据取样程序使培训和测试的改进方法更加相似, 从而帮助人们更好地估计 重新定位网络的精确度 5 。 在本文中, 我们的精确度应用中, 显示一个精确的精确值运行功能 。