Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous, or collective behaviors. However, limited by the scale of existing video datasets, few human analysis approaches have reported their performance on such complex events. To this end, we present a new large-scale dataset, named Human-in-Events or HiEve (Human-centric video analysis in complex Events), for the understanding of human motions, poses, and actions in a variety of realistic events, especially in crowd and complex events. It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time (with an average trajectory length of >480 frames). Based on this dataset, we present an enhanced pose estimation baseline by utilizing the potential of action information to guide the learning of more powerful 2D pose features. We demonstrate that the proposed method is able to boost the performance of existing pose estimation pipelines on our HiEve dataset. Furthermore, we conduct extensive experiments to benchmark recent video analysis approaches together with our baseline methods, demonstrating that HiEve is a challenging dataset for human-centric video analysis. We expect that the dataset will advance the development of cutting-edge techniques in human-centric analysis and the understanding of complex events. The dataset is available at http://humaninevents.org
翻译:随着现代智能城市的发展,人类中心视频分析一直面临着在真实场景中分析各种复杂事件的挑战。复杂事件涉及密集人群、异常现象或集体行为。然而,由于现有视频数据集的规模有限,人类分析方法很少报告其在这些复杂事件方面的表现。为此,我们展示了一个新的大型数据集,名为“复杂事件中以人类为中心的视频分析”或“HiEve ” (“复杂事件中以人类为中心的视频分析 ),以了解人类运动,在各种现实事件中,特别是在人群和复杂事件中,提出和采取行动。它包含一个历史记录数量的配置( > 1M),在复杂事件中行动次数最多( > 56k ),以及持续更长时间(平均轨道长度 > 480 框架)的最大轨迹之一。基于这一数据集,我们利用行动信息的潜力来指导更强大的2D构成特征的学习,我们提出的方法能够促进当前具有挑战性的数据的深度分析。我们所拟采用的方法可以共同推进当前具有挑战性的数据基础分析。我们现有的视频数据库分析将展示我们现有的高层次数据分析。