Closed-circuit television (CCTV) systems are essential nowadays to prevent security threats or dangerous situations, in which early detection is crucial. Novel deep learning-based methods have allowed to develop automatic weapon detectors with promising results. However, these approaches are based on visual weapon appearance only. For handguns, body pose may be a useful cue, especially in cases where the gun is barely visible. In this work, a novel method is proposed to combine, in a single architecture, both weapon appearance and human pose information. First, pose keypoints are estimated to extract hand regions and generate binary pose images, which are the model inputs. Then, each input is processed in different subnetworks to extract two feature maps. Finally, this information is combined to produce the hand region prediction. Results obtained show that the combined model improves overall performance with respect to appearance alone as used by popular methods such as YOLOv3.
翻译:目前,闭路电视系统对于防止安全威胁或危险情况至关重要,在这种情况下,早期发现至关重要。新颖的深层次学习方法使得能够开发自动武器探测器,并取得有希望的成果。不过,这些方法仅以视觉武器外观为基础。对于手枪来说,人体形态可能是有用的提示,特别是在枪几乎看不见的情况下。在这项工作中,提议采用一种新颖的方法,将武器外观和人造外貌信息合并在一个单一的结构中。首先,估计构成关键点,以提取手动区域并生成二元面形图像,这是模型投入。然后,每种输入都在不同子网络中处理,以提取两个特征图。最后,将这种信息合并起来,以产生手动区域预测。获得的结果显示,综合模型提高了外观的总体性,而光观是诸如YOLOv3等流行方法所使用的外观。