Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media. Previous work has largely relied on action recognition techniques to tackle this problem. In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator. Also, considering that collecting frame-level labels for videos is too laborious, we design a weakly supervised two-stage training scheme, where we utilize multiple-instance-learning loss calculated on video-level labels to train the score generator, and adopt the self-training technique to further improve its performance. Extensive experiments on a publicly available large-scale dataset, UBI-Fights, demonstrate the effectiveness of our method, and the performance on the dataset exceeds several previous state-of-the-art approaches. Furthermore, we collect a new dataset, VFD-2000, that specializes in video fight detection, with a larger scale and more scenarios than existing datasets. The implementation of our method and the proposed dataset will be publicly available at https://github.com/Hepta-Col/VideoFightDetection.
翻译:视频中的抗争探测是一种新兴的深层次学习应用,因为今天的监视系统和流媒体非常普遍。 先前的工作主要依靠行动识别技术来解决这一问题。 在本文中,我们提出了一个简单而有效的方法,从新的角度来解决这个问题: 我们设计了抗争检测模型,作为行动认知特征提取器和异常分分生成器的构成。 另外,考虑到收集视频的框架级标签过于辛苦,我们设计了一个监督不力的两阶段培训计划,我们利用视频级别标签计算出来的多重强化学习损失来培训得分生成器,并采用自我培训技术来进一步提高其性能。关于公开提供的大型数据集UBI-Fights的广泛实验,展示了我们方法的有效性,而数据集的性能超过了以前几个最先进的方法。 此外,我们收集了一个新的数据集VFD-2000,专门进行视频抗争探测,其规模和情景比现有的数据集要大。 实施我们的方法和拟议的数据库将公开在 http://gisma/sqistradVcomption上提供。