In the current times, the fear and danger of COVID-19 virus still stands large. Manual monitoring of social distancing norms is impractical with a large population moving about and with insufficient task force and resources to administer them. There is a need for a lightweight, robust and 24X7 video-monitoring system that automates this process. This paper proposes a comprehensive and effective solution to perform person detection, social distancing violation detection, face detection and face mask classification using object detection, clustering and Convolution Neural Network (CNN) based binary classifier. For this, YOLOv3, Density-based spatial clustering of applications with noise (DBSCAN), Dual Shot Face Detector (DSFD) and MobileNetV2 based binary classifier have been employed on surveillance video datasets. This paper also provides a comparative study of different face detection and face mask classification models. Finally, a video dataset labelling method is proposed along with the labelled video dataset to compensate for the lack of dataset in the community and is used for evaluation of the system. The system performance is evaluated in terms of accuracy, F1 score as well as the prediction time, which has to be low for practical applicability. The system performs with an accuracy of 91.2% and F1 score of 90.79% on the labelled video dataset and has an average prediction time of 7.12 seconds for 78 frames of a video.
翻译:目前,COVID-19病毒的恐惧和危险仍然很大。 人工监测社会失常规范是不切实际的,因为大量人口在移动,工作队和资源不足来管理这些规范。 需要一个轻量、强力和24X7的视频监测系统,使这个过程自动化。 本文提出一个全面有效的解决方案,用物体探测、聚集和神经网络(CNN)基于二进制分类器对人进行检测、社会失常检测、面部检测和面罩分类。 为此, YOLOv3, 密度基于空间的噪音应用(DBSCAN)、双张脸探测器(DSDFD)和基于移动二进制分类器的MiveNetV2系统进行监控视频数据集自动化。 本文还就不同面部检测和面罩分类模型进行了比较研究。 最后,提出了视频数据集标签方法,以弥补社区缺乏数据集的情况,并用于系统评价。 系统运行情况按准确度、 F1 评分的F1 和F1 图像平均精确度的准确度为781 。 视频的准确度的准确度为7: