多体育:斯帕蒂奥临时地方化体育行动的多人视频数据集 (MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions)

Spatio-temporal action detection is an important and challenging problem in video understanding. The existing action detection benchmarks are limited in aspects of small numbers of instances in a trimmed video or relatively low-level atomic actions. This paper aims to present a new multi-person dataset of spatio-temporal localized sports actions, coined as MultiSports. We first analyze the important ingredients of constructing a realistic and challenging dataset for spatio-temporal action detection by proposing three criteria: (1) motion dependent identification, (2) with well-defined boundaries, (3) relatively high-level classes. Based on these guidelines, we build the dataset of Multi-Sports v1.0 by selecting 4 sports classes, collecting around 3200 video clips, and annotating around 37790 action instances with 907k bounding boxes. Our datasets are characterized with important properties of strong diversity, detailed annotation, and high quality. Our MultiSports, with its realistic setting and dense annotations, exposes the intrinsic challenge of action localization. To benchmark this, we adapt several representative methods to our dataset and give an in-depth analysis on the difficulty of action localization in our dataset. We hope our MultiSports can serve as a standard benchmark for spatio-temporal action detection in the future. Our dataset website is at https://deeperaction.github.io/multisports/.

翻译：在视频理解方面,发现时空运动是一个重要而具有挑战性的问题。现有的行动检测基准在数量较少的短片或相对较低的原子动作中数量有限。本文的目的是提供一个新的多人数据组,以“多运动”的形式展示时空局部运动行动的数据组。我们首先分析为时空行动检测构建一个现实而具有挑战性的数据集的重要内容,提出三个标准:(1) 运动依附识别,(2) 有明确界定的界限,(3) 相对较高的等级。根据这些准则,我们通过选择4个体育课,收集大约3200个视频剪辑,以及用907公里的框说明大约3 790个行动。我们的数据集具有很强的多样性、详细注解和高质量的重要特性。我们的多功能及其现实设置和密集的描述,暴露了行动本地化的内在挑战。为了衡量这一点,我们调整了多功能运动的数据集组,并给出了我们未来行动目标的深度分析。我们的数据集/多功能网站可以作为我们未来行动的基准。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日