通过体育建立可缩放的视频理解基准 (Building Scalable Video Understanding Benchmarks through Sports)

Existing benchmarks for evaluating long video understanding falls short on multiple aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos (e.g. actions, dialogues, etc.), which are often obtained by manually labeling many frames per second. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We demonstrate the generality of ASAP by aligning unlabeled videos of four different sports (Cricket, Football, Basketball, and American Football) with their corresponding dense annotations (i.e. commentary) freely available on the web. Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed. We then leverage ASAP scalability to create LCric, a large-scale long video understanding benchmark, with over 1000 hours of densely annotated long Cricket videos (with an average sample length of 50 mins) collected at virtually zero annotation cost. We benchmark and analyze state-of-the-art video understanding models on LCric through a large set of compositional multi-choice and regression queries. We establish a human baseline that indicates significant room for new research to explore. The dataset along with the code for ASAP and baselines can be accessed here: https://asap-benchmark.github.io/.

翻译：评估长期视频理解的现有基准在多个方面都存在不足,要么缺乏规模或说明质量,这些限制源于难以收集长视频(例如行动、对话等)的密集说明,这些说明往往是通过手工为每秒多框架贴上手动标签获得的。在这项工作中,我们引入了自动批注和视频流匹配管道(ASAP),通过将收集的四种不同运动(板球、足球、篮球和美国足球)的无标签视频与相应的密集说明(即评论)统一起来,在网络上可以免费查阅。我们的人类研究表明,ASAP能够将视频和说明与高度忠诚、精确和速度一致。然后,我们利用ASAP的缩放性来创建LCric,这是一个大型的长视频理解基准,有超过1 000小时的密集长的Cricket视频(平均样本长度为50分钟),以近乎零度的注解成本来显示。我们通过大量一系列的多层次的图像访问来测量和分析LCricrical的高级视频理解模型模型。我们在这里可以确定一个重大的多层次的基线,用于检索。

相关内容

ASAP

关注 0

ASAP：Application-Specific Systems, Architectures, and Processors。 Explanation：特定于应用程序的系统、体系结构和处理器。 Publisher： IEEE。 SIT：http://dblp.uni-trier.de/db/conf/asap

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日