Identifying players in video is a foundational step in computer vision-based sports analytics. Obtaining player identities is essential for analyzing the game and is used in downstream tasks such as game event recognition. Transformers are the existing standard in Natural Language Processing (NLP) and are swiftly gaining traction in computer vision. Motivated by the increasing success of transformers in computer vision, in this paper, we introduce a transformer network for recognizing players through their jersey numbers in broadcast National Hockey League (NHL) videos. The transformer takes temporal sequences of player frames (also called player tracklets) as input and outputs the probabilities of jersey numbers present in the frames. The proposed network performs better than the previous benchmark on the dataset used. We implement a weakly-supervised training approach by generating approximate frame-level labels for jersey number presence and use the frame-level labels for faster training. We also utilize player shifts available in the NHL play-by-play data by reading the game time using optical character recognition (OCR) to get the players on the ice rink at a certain game time. Using player shifts improved the player identification accuracy by 6%.
翻译:视频中的识别玩家是计算机基于视觉的体育分析的基本步骤。 获取玩家身份对于分析游戏至关重要, 并且用于游戏事件识别等下游任务中。 变换器是自然语言处理(NLP)的现有标准, 并且正在迅速获得计算机视觉的牵引力。 受计算机视觉变压器日益成功的激励, 我们在本文件中引入了一个变压器网络, 通过播放国家曲棍球联盟(NHHL)视频中的球衣号码来识别玩家。 变压器将玩家框架( 也称为玩家轨道)的时间序列作为输入和输出框中球员数字的概率。 拟议的网络比先前使用的数据集基准运行得更好。 我们实施了一种弱度监督式的培训方法, 为球员数字的存在创建了大约的框架级标签, 并使用框架级标签来进行更快的培训。 我们还利用NHEL 播放器逐场数据中的玩家变换, 利用光字符识别( OCR) 来阅读游戏时间, 在某个游戏游戏游戏游戏场上获得玩家的游戏识别器精度 6 。 我们使用改进了玩家变的玩家变式播放器, 6 。