The 3rd annual installment of the ActivityNet Large- Scale Activity Recognition Challenge, held as a full-day workshop in CVPR 2018, focused on the recognition of daily life, high-level, goal-oriented activities from user-generated videos as those found in internet video portals. The 2018 challenge hosted six diverse tasks which aimed to push the limits of semantic visual understanding of videos as well as bridge visual content with human captions. Three out of the six tasks were based on the ActivityNet dataset, which was introduced in CVPR 2015 and organized hierarchically in a semantic taxonomy. These tasks focused on tracing evidence of activities in time in the form of proposals, class labels, and captions. In this installment of the challenge, we hosted three guest tasks to enrich the understanding of visual information in videos. The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.
翻译:2018年的挑战包括六种不同的任务,旨在推动对视频的语义理解的局限性,以及与人类字幕连接的视觉内容。这六项任务中有三项基于活动网数据集,该数据集是CVPR 2015年引入的,按等级排列在语义分类中,以CVPR 2018 中,以跟踪以建议、类标签和字幕形式及时开展活动的证据为重点。在这项挑战的安装过程中,我们主办了三项嘉宾任务,以丰富对视频中视觉信息的理解。客座任务侧重于大规模活动识别问题的补充方面,涉及三项具有挑战性且最近汇编的数据集:谷歌DeepMind的Kinitics-600数据集、伯克利和谷歌的AVA数据集以及麻省理学和IBM研究的时间数据集。