Action spotting in soccer videos is the task of identifying the specific time when a certain key action of the game occurs. Lately, it has received a large amount of attention and powerful methods have been introduced. Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences. Most approaches have focused on the latter, given that their models exploit the global visual features of the sequences. In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs. For the player identification, or player classification task, we obtain an accuracy of 97.72% in our annotated benchmark. For the action spotting task, our method obtains an overall performance of 57.83% average-mAP by combining it with other audiovisual modalities. This performance surpasses similar graph-based methods and has competitive results with heavy computing methods. Code and data are available at https://github.com/IPCV/soccer_action_spotting.
翻译:足球视频中的行动定位是确定游戏某一关键动作发生的具体时间的任务。 最近,它得到了大量关注,并且采用了强有力的方法。 行动定位涉及到了解游戏的动态、事件的复杂性和视频序列的变化。 大多数方法都侧重于后者,因为其模型利用了序列的全球视觉特征。 在这项工作中,我们侧重于前者,方法是:(a) 将玩家、裁判和目标管理员作为图表中的节点进行识别和代表,以及(b) 将其时间互动作为图表的序列进行模拟。对于玩家识别或玩家分类任务,我们在附加说明的基准中获得了97.72%的准确度。对于行动定位任务,我们的方法通过与其他音像模式相结合,获得了57.83%的平均-MAP的总体性能。这种性能超过类似的图表方法,并且具有具有竞争性的重计算方法。 代码和数据可在https://github.com/IPCV/soceraction_spottinging上查阅。