Closed-circuit television (CCTV) systems are essential nowadays to prevent security threats or dangerous situations, in which early detection is crucial. Novel deep learning-based methods have allowed to develop automatic weapon detectors with promising results. However, these approaches are mainly based on visual weapon appearance only. For handguns, body pose may be a useful cue, especially in cases where the gun is barely visible. In this work, a novel method is proposed to combine, in a single architecture, both weapon appearance and human pose information. First, pose keypoints are estimated to extract hand regions and generate binary pose images, which are the model inputs. Then, each input is processed in different subnetworks and combined to produce the handgun bounding box. Results obtained show that the combined model improves the handgun detection state of the art, achieving from 4.23 to 18.9 AP points more than the best previous approach.
翻译:现在,闭路电视系统对于防止安全威胁或危险情况至关重要,在这种情况下,早期发现至关重要。新颖的深层次学习方法使得能够开发自动武器探测器,并取得有希望的成果。不过,这些方法主要以视觉武器外观为基础。对于手枪来说,人体姿势可能是一个有用的提示,特别是在枪几乎看不见的情况下。在这项工作中,提议采用一种新颖的方法,将武器外观和人造信息合并在一个单一的结构中。首先,估计构成关键点是为了提取手动区域并产生二元面形图像,这是模型投入。然后,每种输入都在不同子网络中处理,并合在一起制作手枪捆绑盒。获得的结果显示,综合模型改进了手持枪的探测状态,从4.23到18.9个AP点超过了以往的最佳方法。