利用时间分布的深有线电视新闻网、区域NN和关注机制对利用时间分布的时地特征的实时敌对行动积极性探测分析 (Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms)

Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm ,76.76% on CNN-Transformer and 80% on C3D.

翻译：通过闭路电视摄像系统进行实时视频监控,对于确保公共安全至关重要,这是今天的一个优先事项。闭路电视摄像头对于确保公共安全至关重要。尽管闭路电视摄像头在提高安全性方面大有帮助,但这些系统需要不断的人际互动和监测。为了消除这一问题,可以使用深学习视频分类技术建立智能监视系统,这些技术可以帮助我们将监视系统自动化,从而在发生时发现暴力。在这项研究中,我们探索深学习视频分类技术,以发现正在发生的暴力。传统图像分类技术在试图对视频进行分类时不够,因为它们试图对预测开始闪烁的每个框架进行单独分类。因此,许多研究人员正在利用视频分类技术,在进行分类时,这些视频分类技术将考虑到突发性特征。然而,为了消除这一问题,可以使用深层视频分类技术来帮助我们将监视系统自动化监视系统自动化,从而自动监测暴力的发生情况。尽管这些技术能确保更高的准确度,但它们在计算得更重。在铭记这些限制的情况下,我们尝试了各种视频分类和行动识别技术,如CONLSTM、LCN(有定制的CN-16层和VGGD-16作为地段精度的精确度为80%的CRIS-D3,在CRIS-CRIS-RIS-CRIS-RIS-CRIS-CRIS-RIS-RIS-RIS-RIS-RIS-RIS3上,我们在80的测试中,在80%BIS3上,我们和BIS上,在80BIS上,我们在BIS-RIS上进行了试验了80%BIS-RIS-RIS-RIS和BIS-DMIS3中,在CRIS-RIS-RIS-III-III-III-III-III-III-III-III-III-III-III的80的测试上实现了-III-III-III-III-III-III的80-III的80-III-III-III的80%上,我们上,我们已经和BLMLMLML3上实现了-III和80-DMLML3上实现了。