To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd, formed by 112 video clips with 33,600 HD frames in various scenarios. Notably, we annotate 20,800 people trajectories with 4.8 million heads and several video-level attributes. Meanwhile, we design the Space-Time Neighbor-Aware Network (STNNet) as a strong baseline to solve object detection, tracking and counting jointly in dense crowds. STNNet is formed by the feature extraction module, followed by the density map estimation heads, and localization and association subnets. To exploit the context information of neighboring objects, we design the neighboring context loss to guide the association subnet training, which enforces consistent relative position of nearby objects in temporal domain. Extensive experiments on our DroneCrowd dataset demonstrate that STNNet performs favorably against the state-of-the-arts.
翻译:为促进无人机获取的视频中的物体探测、跟踪和计算算法的发展,我们建立了一个基准,以新的无人机获取的大规模数据集为代号为 " DroneCrowd ",由112个视频剪辑组成,在各种情景下有33,600个HD框架。值得注意的是,我们注意到了20,800人的轨迹,有480万头和若干视频级属性。与此同时,我们设计了空间时邻里器网络,作为在密集人群中共同解决物体探测、跟踪和计算问题的坚实基线。STNNet由特征提取模块组成,随后是密度地图估计头、本地化和关联子网络。为了利用相邻物体的背景信息,我们设计了相邻环境损失,以指导协会子网培训,该子网在时间域内对附近物体进行一致的相对位置。在我们的DrooneCrow数据集上进行的广泛实验表明,STNNet对州立艺术表现良好。