探测基于深地地貌融合技术的视频暴力 (Detecting Violence in Video Based on Deep Features Fusion Technique)

from arxiv, The IIXth International Workshop on Representation, analysis and recognition of shape and motion FroM Imaging data (RFMI 2019), December 11-13, 2019, Sidi Bou Said, Tunis

With the rapid growth of surveillance cameras in many public places to mon-itor human activities such as in malls, streets, schools and, prisons, there is a strong demand for such systems to detect violence events automatically. Au-tomatic analysis of video to detect violence is significant for law enforce-ment. Moreover, it helps to avoid any social, economic and environmental damages. Mostly, all systems today require manual human supervisors to de-tect violence scenes in the video which is inefficient and inaccurate. in this work, we interest in physical violence that involved two persons or more. This work proposed a novel method to detect violence using a fusion tech-nique of two significantly different convolutional neural networks (CNNs) which are AlexNet and SqueezeNet networks. Each network followed by separate Convolution Long Short Term memory (ConvLSTM) to extract ro-bust and richer features from a video in the final hidden state. Then, making a fusion of these two obtained states and fed to the max-pooling layer. Final-ly, features were classified using a series of fully connected layers and soft-max classifier. The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy: Hockey Fight dataset, Movie dataset and Violent Flow dataset. The results show an accuracy of 97%, 100%, and 96% respectively. A comparison of the results with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.

翻译：随着许多公共场所监视摄像机的迅速增长,人们的活动也会受到监视,比如在购物中心、街道、学校和监狱,因此对此类系统的需求非常强烈,以自动检测暴力事件。对视频进行自动分析以发现暴力对于执法来说意义重大。此外,还有助于避免任何社会、经济和环境损害。大多数情况下,所有系统都要求人工监督员在视频中解开暴力场景,因为视频中缺乏效率和不准确。在这项工作中,我们对涉及两个人或更多人的人身暴力感兴趣。这项工作提出了一个新颖的方法,用两个截然不同的革命性神经网络(CNNs)的聚合技术检测暴力事件,这两个网络是AlexNet和SquezeNet的网络。每个网络之后都有单独的革命性长时段记忆(ConvLSTM),以便从最后隐藏状态的视频中提取罗乱和更丰富的特征。随后,将这两个获得的状态混在一起,并反馈到最多层。最后,这些特征使用一系列完全连接的层和软式暴力性神经神经网络(CNNs)检测暴力神经网络(CNNs),这两个网络是AlexNet和SquezeezeNetnetnet网络网络网络。每个网络之后,将分别用一个独立的快速数据的准确性数据分析结果。拟议方法的性分析,用Areviolviolmamakenal 。。 view d d dregalmaxal 的精确性数据,用一种方法分别评估了一种方法的精确性数据,用一种方法的精确性数据,用一种方法,用Aregimealmatal