视频脚下的实时动作识别 (Real Time Action Recognition from Video Footage)

Crime rate is increasing proportionally with the increasing rate of the population. The most prominent approach was to introduce Closed-Circuit Television (CCTV) camera-based surveillance to tackle the issue. Video surveillance cameras have added a new dimension to detect crime. Several research works on autonomous security camera surveillance are currently ongoing, where the fundamental goal is to discover violent activity from video feeds. From the technical viewpoint, this is a challenging problem because analyzing a set of frames, i.e., videos in temporal dimension to detect violence might need careful machine learning model training to reduce false results. This research focuses on this problem by integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities, e.g., kicking, punching, and slapping. Initially, we designed a dataset of this specific interest, which contains 600 videos (200 for each action). Later, we have utilized existing pre-trained model architectures to extract features, and later used deep learning network for classification. Also, We have classified our models' accuracy, and confusion matrix on different pre-trained architectures like VGG16, InceptionV3, ResNet50, Xception and MobileNet V2 among which VGG16 and MobileNet V2 performed better.

翻译：最突出的方法是引入闭路电视(CCTV)摄像头监控,以解决这一问题。视频监控摄像头增加了一个新的层面来侦查犯罪。一些关于自主安全摄像头监控的研究工作正在进行中,其基本目标是从视频反馈中发现暴力活动。从技术角度看,这是一个具有挑战性的问题,因为分析一套框架,即用于检测暴力的时空视频可能需要仔细的机器学习模式培训,以减少错误的结果。这一研究侧重于这一问题,方法是整合先进的先进深层学习方法,以确保为侦查暴力活动建立强有力的自动监控管道,例如踢、拳击和拍拍。最初,我们设计了这一具体兴趣的数据集,其中包含600个视频(每次行动200个),后来,我们利用了现有的经过预先培训的模型结构来提取特征,后来又使用了深层学习网络进行分类。此外,我们将模型的准确性和混乱矩阵分解了VGG16、IncepionV3、ResNet50、Xcep2V16和MovealNet16和VGIS2等不同经过培训的结构,这些结构进行了更好的操作。