This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2022 benchmark. This task proposes two subtasks: stroke classification from trimmed videos, and stroke detection from untrimmed videos. This baseline addresses both subtasks. We propose two types of 3D-CNN architectures to solve the two subtasks. Both 3D-CNNs use Spatio-temporal convolutions and attention mechanisms. The architectures and the training process are tailored to solve the addressed subtask. This baseline method is shared publicly online to help the participants in their investigation and alleviate eventually some aspects of the task such as video processing, training method, evaluation and submission routine. The baseline method reaches 86.4% of accuracy with our v2 model for the classification subtask. For the detection subtask, the baseline reaches a mAP of 0.131 and IoU of 0.515 with our v1 model.
翻译:本文件介绍了为MediaEval 2022基准的体育视频任务部分提议的基线方法。本任务提出了两个子任务:从剪辑的视频中中划分,和从未剪辑的视频中划分。本基准针对这两个子任务。我们提出了两种类型的3D-CNN结构以解决这两个子任务。3D-CNN结构都使用Spatio-时空演进和关注机制。这些结构和培训程序是专门为解决所涉子任务而设计的。这一基准方法在网上公开分享,以帮助参与者进行调查,并最终缓解任务的某些方面,如视频处理、培训方法、评估和提交常规。基准方法达到与我们分类子任务V2模型的准确度86.4%。对于探测子任务,基线与我们的V1模型达到0.131的 mAP和0.515的IoU。