In response to the ongoing COVID-19 pandemic, we present a robust deep learning pipeline that is capable of identifying correct and incorrect mask-wearing from real-time video streams. To accomplish this goal, we devised two separate approaches and evaluated their performance and run-time efficiency. The first approach leverages a pre-trained face detector in combination with a mask-wearing image classifier trained on a large-scale synthetic dataset. The second approach utilizes a state-of-the-art object detection network to perform localization and classification of faces in one shot, fine-tuned on a small set of labeled real-world images. The first pipeline achieved a test accuracy of 99.97% on the synthetic dataset and maintained 6 FPS running on video data. The second pipeline achieved a mAP(0.5) of 89% on real-world images while sustaining 52 FPS on video data. We have concluded that if a larger dataset with bounding-box labels can be curated, this task is best suited using object detection architectures such as YOLO and SSD due to their superior inference speed and satisfactory performance on key evaluation metrics.
翻译:为了应对正在发生的COVID-19大流行,我们提出了一个强有力的深层次学习管道,能够从实时视频流中辨别出正确和不正确戴面罩的情况。为了实现这一目标,我们设计了两种不同的方法,并评价了它们的性能和运行效率。第一种方法是利用预先训练的面部探测器和在大规模合成数据集方面受过训练的戴面罩图像分类器。第二种方法是利用最先进的天体探测网络,用一个镜头对面孔进行本地化和分类,对一小套贴有标签的现实世界图像进行微调调整。第一个管道在合成数据集上实现了99.97%的测试精度,并保持了6个FPS在视频数据上运行。第二个管道在维持52个FPS视频数据的同时,在真实世界图像上实现了89%的MAAP(0.5)。我们的结论是,如果能够对装有捆绑标签的更大数据集进行校正,这项工作最适合使用诸如YOLO和SSD这样的天体探测结构,因为其速度较高,在关键评价标准上表现令人满意。