使用人形和武器外形相结合的手枪探测 (Handgun detection using combined human pose and weapon appearance)

CCTV surveillance systems are essential nowadays to prevent and mitigate security threats or dangerous situations such as mass shootings or terrorist attacks, in which early detection is crucial. These solutions are manually supervised by a security operator, which has significant limitations. Novel deep learning-based methods have allowed to develop automatic and real time weapon detectors with promising results. However, these approaches are based on visual weapon appearance only and no additional contextual information is exploited. For handguns, body pose may be a useful cue, especially in cases where the gun is barely visible and also as a way to reduce false positives. In this work, a novel method is proposed to combine, in a single architecture, both weapon appearance and 2D human pose information. First, pose keypoints are estimated to extract hand regions and generate binary pose images, which are the model inputs. Then, each input is processed with a different subnetwork to extract two feature maps. Finally, this information is combined to produce the hand region prediction (handgun vs no-handgun). A new dataset composed of samples collected from different sources has been used to evaluate model performance under different situations. Moreover, the robustness of the model to different brightness and weapon size conditions (simulating conditions in which appearance is degraded by low light and distance to the camera) have also been tested. Results obtained show that the combined model improves overall performance substantially with respect to appearance alone as used by other popular methods such as YOLOv3.

翻译：目前,闭路电视监视系统对于防止和减轻安全威胁或危险情况(如大规模枪击或恐怖袭击)至关重要,因为早期发现至关重要。这些解决办法由安全操作者人工监督,具有重大局限性。新颖的深层次学习方法允许开发自动实时武器探测器,并产生令人乐观的结果。不过,这些方法仅以视觉武器外观为基础,没有利用额外的背景信息。对手枪而言,人体布局可能是有用的提示,特别是在枪支几乎看不到的情况下,也是减少假阳性的一种方法。在这项工作中,提出了一种新颖的方法,将武器外观和2D人造型信息合并在一个单一的结构中。首先,对关键点进行估算,以提取手动区域并生成双形图像,这是模型投入。然后,每种输入都以不同的子网络处理,以提取两个特征地图。最后,将这种信息组合起来,产生手区域预测(手枪与无手枪之间没有手枪。3),由不同来源收集的样本组成的新数据集被用于评估不同情况下的模型性能。此外,模型的坚固性能度和光度也随着低光度的图像的显示,因此,通过低光度模型和低光度的光度的光度的图像显示,以不同的光度展示方式对低光度进行了改进。